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PROTEIN ISOLATION AND ANALYSIS 

The present invention relates to the isolation and analysis of protein 
mass analysis. The mvmtion has particular application to the isolation of bincfcg 
5 proteins such as antibodies. The invention also provides for modification of proteins 
or protem fragments m order to facilitate mass analysis and or the isolation of specific 
proteins encoded by members of a gene Ubrary. ^^pci-iui, 

1 n of proteins, the invention provides new methods for isolating specific 

10 protems fix^in a complex mature of such proteins by virtue of binding to a spedfic 
target. In particular, the mvention provides methods for isolating specific antibody 
dom^ fix,m a gene library-derived mixtiire of such domains by virtue of binding to a 
specific target antigen. For the analysis of proteins, the invention provides new 
methods for analysmg complex mixhires of proteins especially to compare proteins 
15 between two or more different samples. pxvicms 

For the isolation of proteins from complex mixtions by virtue of binding to a specific 
target and where tiie identity or amino acid sequence of tiie protein is u^own 
beforehand. ,t has usually been very difScult to isolate enough protein which binds to 
20 the target for direct charactensation of the protein. In order to select a protein of 

interest fi^m a large Ubrary of natiiral, synthetic or semi-synthetic proteins, "protein 
display methods have been developed whereby recombinant proteins are produced 
physically hnked to their genes such that recovery of the proteins aUows subsequent 
rapid recovery of the genes. Such methods include "in-vivo" display methods such as 
25 disp ay on bacteriophage ("phage display), bacteria and yeast, and include "in-vitro" 
display methods such as display on ribosomes ("ribosome display"). The recovered 
genes can be sequenced in order to determine the identity of the recovered protein or 
can be i^ed to regenerate the recovered protein. If a library of genes is subject to 
protem display methods whereby proteins are selected for a particular characteristics 
such as bmdmg to an antigen (for antibody variable regions), then at each selection 
round, the recovered genes wiU be enriched for those encoding proteins exhibiting 
such particular characteristics. Disadvantages of current "in-vivo" display methods 
include a Imut to the amount of functional protein displayed (phage.displa v is usua llv\ 
hmitedtop^Qlypeptides_ofless-than.40kDa),.theustialneedtofiisetherero^^^ \ 
protem to a-host protem (which may interfere with the fimction or binding of the \ 
recombmant protemX and an inability to vary the number of proteins displayed per 1 
display particle; the latter is also a problem with "in-vitro" display methods such as 

nbosome display. In addition, methods for the selection of proteins with particular 1 
cha.-actenstics such as binding to an antigen are limited due to the small sizes of the 
display particles such that methods such as fluorescence activated cell sorting (FAGS') ■ 
caimot readily be used. Thus, there remains a need for new methods to improve the i 
isolation of proteins from complex mixtures, in particular to improve the isolation of 
antibody vanable regions (Fv's) from complex mixhires of Fv's. This the present 
invention provides for improved methods for isolation of proteins from complex 
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mixtures. In particular, the present invention combines the use of protein Ubraires 
generated from gene hbraries with improvements in mass spectrometry and especially 
improvements m matrix-assisted laser desorption/ionisation time-of-flight (MALDI- 
ToF) spectrometry and the abiUty to directly sequence ToF-separated peptides by 

(MS-MS) and, more recently, the ability to combine ToF 
and MS/MS mto one device (Q-ToF) and the abiUty to combine HPLC and electron 
spray ^S) tandem mass spectrometry. The present invention also includes new 
methods for screening for individual proteins from complex protein mixtures whereby 
fliese proteins are not "displayed" i.e. bound to their corresponding genes either during 
10 or after bindmg to the target. The present invention also includes new methods for 
screenmg for mdividual proteins from complex protein mixtures whereby neither the 
proteins nor the target are "displayed" i.e. bound to any other molecule or structure 
The present mvention also includes new methods for screening for individual proteins 
from complex protein mixtures whereby the proteins and their corresponding genes 
are hnked together via the addition or inclusion of an "associating moiety" whereby 
the proteins bmd to the target either before or after addition of the "associatins 
moiety^ . . , ° 
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TTius m a first aspect, the present invention provides @) mediod of protein 
Identification, screening and/or sequencing comprising providing a Ubrary of 
mdividual proteins, one or more of which may bind to a target of interest, wherein 
each mdividual protem mcludes in its sequence a "barcode" sequence, which can be 
used to Identify each individual protein in the library. 

This aspect of the present invention provides for Ubraries of proteins, especially 
recombmant antibody domains such as Fv's, whereby individual protein members of 
the hbrary mclude, within their amino acid sequence, a tract of sequence (a "barcode") 
which can subsequently be sequenced in order to identify which protein(s) has bound 
to the specific, target (or, in the case of Fv's, "antigen"). This embodiment wiU apply 
especially where the Fv's are derived from human genes whereby the selected Fv may 
be smtable for human ther^eutic or diagnostic use. In this particular application, an 
extensive gene library of FVs is created from a pool of immunoglobulin cDNA's such 
as those derived firom peripheral blood B ceUs in humans or such as pools created 
synthetically usmg human variable regions with semi-randomised ("combinatorial") 
CDRs (conq)limentarify-deteraiining regions) at one or more positions. If this gene 
hbrary is created in such manner that a random (or semi-random) gene sequence is 
included within the Fv coding region or terminal to this region, then such a 
random/semi-random gene sequence will generate a random/semi-random peptide 
sequence associated with individual Fv's. Such a random/semi-random gene sequence 
40 is created usmg standard methods such as oligonucleotide primingADNA polymerase 
extension or PGR whereby a random/semi-random synthetic oligonucleotide sequence 
is used as one of a pair of primers used to amplify immunoglobuUn gene fragments 
dunng the creation of the Fv gene library. If members of the Fv Ubrary comprise two 
chams (i.e. heavy and hght chain-derived chains (VH and VL)) as opposed to a single- 
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chain (VH and VL joined by a peptide linker), then individual barcodes can be 
associated with each of the chains (or can be associated with one of the chains only). 
Upon creation of the library, the resultant Fv's each include one or more "peptide 
barcodes" unique to that particular Fv or to a small subset of Fv's from within the 
5 complex library. Preferably the peptide barcode is C terminal to the single-chain Fr 
region or C tenninal to the VH br VL or both and includes, flanked between itself and 
the Fv region, one or more protease sensitive sites such as sites for enterokinase 
(cleaves after Asp-Asp-Asp-Asp-Lys, Factor Xa (cleaves after Ee-Glu/Asp-Gly-Arg) 
or other endopeptidases. If a mixture of such Fv's is produced from a suitable gene 

1 0 library, then this mixture is mixed with a target antigen (or antigens such as on cells), 
usually where the antigen is immobilised. This results in specific Fv's binding to the' 
target antigen with non-binders (or weak binders depending on the stringency of 
washing) being washed away. Having washed away excess antibodies, the remaining 
antigen/Fv complex is then usually released from the Fv by digestion with the 

15 endoprotease used to cleave the introduced protease sensitive site. This released 

barcoded peptide is then subjected to mass analysis / mass spectrometry sequencing 
either directly or, if desired, following capture by virtue of specific amino acids or 
amino acid sequences which allow the peptide to be captured onto a solid phase such a 
cysteine residues which can be biotinylated for subsequent capture on immobiUsed 

20 avidin or streptavidin. Alternatively, any other method can be employed to determine 
the sequence of the peptide barcode either within the Fv or after release including 
using specific Hgands which bind to the barcode in a sequence-specific manner. 
Having deteimined the sequences (or part-sequence) of barcodes derived from bound 
Fv's, corresponding synthetic oligonucleotides are then produced and used to 

25 specifically ainphfy or enrich for specific Fv genes from the library. These specific or 
enriched Fv (or VH and VL) genes are then further used to generate corresponding 
Fv's which could then be retested for antigen binding either individually or as part of a 
small pool of isolated Fv's. Ultimately, by this method, specific Fv's can be generated 
with desirable antigen binding properties and, if from a human source, potential 

30 chnical utility. This aspect also encompasses the use of multiple barcodes associated 
with individual proteins or Fv's, for example two adjacent barcodes at the C terminus 
of Fv's whereby two peptides are released from each Fv by protease digestion, either 
, simultaneouslyinordertoenhance the identity ofFv's which bind to the target, or 
sequentially whereby different proteases are used in successive rounds of digestion to 

35 provide a different means to subsequently amplify Fv genes corresponding to jFv's 
which bind to the target. This aspect also encompasses the use of multiple barcodes 
which are analysed at the same time in order to increase the diversity of overall 
barcode sequences to provide specific coding of individual protems. This aspect also 
encompasses the use of barcodes within individual proteins, for example within one or 

40 more CDR positions of an Fv. This aspect also encompasses the use of proteases 
which might also digest the protein components of the protein: target mixture or, 
additionally, any protein agent used to immobilise the target, with the proviso that the 
barcode peptides released from the bound test protein(s) can still be detected and 
sequenced within the background of other peptides. In the preferred format of this 
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aspect, a single region of barcode is provided at the C terminus of the Ught chains 
forming a soluble Fab fragment whereby VHs and VLs are encoded by the same 
expression cassette or cistron such that the barcode sequence can be used to access 
both VH and VL genes. Such an Fab fragment can be conveniently produced using a 
5 range of expression systems, for example the M13 bacteriophage vector system where 
by mtroducdon of secretory leader sequences, the heavy and light chains of Fabs are 
secreted mto the periplasmic space of the host bacteria and harvested from that space 
The vector system is first prepared with in-firame barcodes by cloning in mixtures of ' 
synthetic ohgonucleotides. For the foimation of two adjacent barcodes, this is 
10 convemently undertaken by sequential cloning or oUgonucelotide mutagenesis 

whereby pooled Ml 3 recombinants containing the first mixed barcode are prepared as 
a template for subsequent cloning of the second in-frame barcode Preferably the 

barcoding IS designed such that the encoded protein contains endonuclease sites both 
flankmg and between the two barcodes and also whereby a "spacer" region adjacent to 
15 one of the barcodes creates a peptide including that barcode which has a higher 

molecular weight than the other barcode. By judicious design of barcodes and the use 
of multiple barcodes m this manner, there is provided an option to simply analyse 
masses of endoprotease-released peptides by, for example. MALDI-ToF whereby the 
sequences of the peptides can be deduced (or near deduced) such that synthetic 
20 oligonucleotides can be designed to isolate (or enrich) for the specific proteins with 
the barcode(s) detected by MALDI-ToF analysis. Such deduction of these sequences 
is achieved by design of sequences whereby specific amino acids only occur in one or 
two positions along the peptide. For example, where the peptide is designed using 17 
of the 20 natural amino acids (hereby designated A-Q). then the sequences might be 
designed with options for any of three amino acids at each position along the peptide 
sequence as follows; 
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This design would give a theoretical 6561 different peptide sequence barcodes If an 
35 adjacent barcode with a spacer region is also designed on the same basis, then this 

would give an additional 6561 different barcodes. In combination, this would create 4 
X 10 barcode sequences which would be adequate to uniquely tag most members of a 
protem library of such size. The use of additional adjacent barcodes or longer 
barcodes based, for example, on use of two specific amino acids at any position in the 
sequence (thus creating 262,144 different barcode sequences using 19 amino acids) 
would mcrease the diversity of barcodes provided. In practice, codon redundancy is 
reduced through the judicious choice of codons at each position in the sequence during 
design of mixed synthetic oligonucleotides. One design of oligonucleotide for an 8 
amino acid barcode peptide for MS/MS sequencing is as follows; 
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Codons - NAC NCC 

5 Amino acids - N T 

D P 
H A 
Y S 

10 where codons N = a, C, G or T 

K = G or T 
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V = A, c or G 
4X4X3X3X2X3X4X4 = 13824 barcode sequences 

sequence is deduced by MALDI ToF ^ mIa/I ?," ^ » candidate barcode 

25 ^^'"'T l^'^'^^^ ""^^ ^""^ nested within a gen? 

in „ . !? """l Ihis first ispca of the mvcntion can cover a number of 
30 va™, „ca,„<b ^ a,e undcrtyfag primipfc a apecific protein i',SS^fft„» 
a Iftrary of ».ch p,oteu«, via „a« or aequence an.lyjs of o« or mo^SS 
^ci. led «,a or encoded by that specific protein id. aa aneh. tSZSSh,^ a 



Lp^^™Siro?.br,ff.^tss:s^f."SbT^^ 
rjtSr^iiaSSfSss-c^XtsSetp"^^^^^ - 

complex mixtures which demonstrate certain binding proper, es such as bSng fo 
other macromoles such as DNA can be detected using the present meftod The 
present method mcludes a variety of ways for adding'peptide b" ode^t o pr2ls 
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includmg methods where the barcode is encoded within the gene fragment encoding 
the protein. However, barcodes can be added to such proteins or to any other suitable 
mixture of molecules by direct attachment of peptides. For example, specific peptides 
can be added to specific antibodies or proteins using a range of chemical or 
photochemical methods. One appHcation of such a method is to label one complex 
mixture of protems with one barcode (or selection of barcodes, for example with 
different protem specificities) and the other barcode to an alternative complex mixture 
of protems, for example to differentially barcode proteins from two different samples 
which are then mixed. It will be understood by those skilled in the art that the 
principle of addmg peptide barcodes to proteins or other molecules could also be 
appUed to non-peptide barcodes whereby such barcodes can be directly identified (or 
nearly identified) usmg mass or sequencing methods. As such, the barcodes could 
mclude nucleic acid barcodes attached to proteins or other molecules including nucleic 
acids. As with peptide barcodes, such nucleic acid barcodes can be analysed by mass 
spectrometry to provide an accurate estimate of mass. Such barcodes might be 
released from the proteins or other molecules using restriction enzymes instead of 
proteases. 

It will be understood by those skilled in the art that, within the scope of the present 
20 mvention, there are applications of the first aspect other than in the isolation of 
protems. For example, the distribution of proteins or other ligands within a live 
orgamsm can be analysed by analysis of barcodes by mass or by sequence which are 
associated with specific organs within the organism. In the analysis of peptide or 
protem bmding specificity to other molecules, barcodes can be constracted as part of 
25 the peptide or protein binding regions in order to analyse specificity by mass or 

sequence analysis of barcodes. For example, mixed peptide barcode sequences can be 
constmcted around known anchor residues of MHC moelcules and the spectrum of 
peptides which bind to specific MHC molecules then determined by elution and mass 
or sequence analysis of the barcode. 

In a second aspect, the present invention provides ^ method of screening a protein 
library comprising screening said library for one or more desired properties, followed 
by derepUcation to identify one or more individual proteins in the library having the 
desired property. 

35 

This aspect of the present invention provides for libraries of proteins, especially 
recombinant antibodies such as Fv's, whereby individual members of the libraries are 
isolated for bmding to specific targets whereby pools of proteins from the Ubrary are 
screened individually and then positive pools are subjected to one or more rounds of 
40 derepUcation until the individual proteins in the hbrary which bind to the target are 

identified. Specifically, this aspect relates to screening protein libraries without use of 
a display system i.e. where there is either no physical association of the proteins with 
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corresponding genes. In this aspect, pools of proteins are screened for binding to the 
target whereby either the target is labelled to indicate which pool(s) contain proteins 
which bind, or where the target is detected without labelling. A particularly favoured 
method is to screen pools of proteins in solution without any fusion or attachment to 
5 other moieties (which might influence the binding of proteins to their targets) and then 
to precipitate the total protein pool (together with any attached target) prior to mass 
analysis, especially via MALDI-ToF, in order to screen for a "fingerprint" of ionised 
peaks which is representative of the target and therefore indicates if the target has 
bound. Once one or more positively-binding pools are identified, these can be then 

10 derephcated either to reduce the complexity of the pool or to segregate out individual 
proteins for screemng.for binders to the target. In practice, a particularly favourable 
way of assembling pools of proteins is to firstly to assemble pools of genes encoding 
these proteins. If genes are cloned into plasmid or phage vectors for example these 
can be pooled by mixmg together individual bacterial colonies or plaques, or more 

1 5 conveniently by segregating pools of colonies/plaques by plating onto separate agar 
plates (at densities such as 1000 colonies/plaques per plate) and scraping/eluting 
colomes/plaques from these plates into one mixture which is then used for synthesis of 
the protems either through bacterial/phage expression or through in vitro 
transcription/translation. In a similar manner, other microorganisms or in vitro 

20 synthesis syst«ns could be used for synthesis of proteins. This aspect also 

encompasses the use of complex targets such as mixtures of molecules, whole cells or 
cell membranes whereby the molecular target yields a mass analysis "fingerprint" 
which is characteristic for binding to a specific molecular target within the complex 
target. This aspect also encompasses, where the target is a protem, the use of proteases 

25 to digest the target(s) in order to produce a peptide mass fingerprint indicative of the 
target and which, where the protease also digests the protein(s) fix)m the library, can 
still be detected even within a background of other peptides derived from the library. 
This aspect also encompasses a range of different types of "target" and criteria for 
selection of pools of proteins or individual proteins other than by binding to a target. 

30 For example, the aspect encompasses the use of biological assay systems as a criteria 
for selection of proteins, for example where proteins are selected for the ability to 
stimulate or inhibit a biological activity. Other fonnats of binding assays would 
include inhibition of binding of a hgand to its receptor and selection for proteins which 
bmd to certam locations on a target where the target might be, for example, a 

35 molecule, cell or tissue section. 

In a third aspect, the present invention provides A method of protein identification 
and/or sequencing comprising providing a library of individual proteins, one or more 
of which may bind to a target of interest, wherein each individual protein, together 
40 with its gene, is bound to an "associating moiety". 

This aspect of the present invention provides for libraries of proteins, especially 
recombinant antibodies such as Fv's, whereby the proteins and their corresponding 
genes are linked together via the addition or inclusion of an "associating moiety" 
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whereby the protems or Fv's bind to the target either before or after addition of the 
associating mo.ety". The associating moiety serves the purpose of enabling 
regeneradon of the proteins or FVs via the associated corresponding gene for 
^an^le by PGR anq,bfication (or other means of amplification such as v^a bacterial 

T or by direct sequencing and subsequent regeneration via^ 
sequence). Where the proteins or Fv's are generated as a pool with a corresponding 
pool of genes tiiai genes associated with the proteins or FVs which bind totiietoget 

°, T*" 1^ ''T^^ "^'^ ^ for regeneration of 

or smaUer pools of proteins or Fv's in order to repeat screening to identify the specif 
proteins or Fv's (via the corresponding genes) which bind to the target. 

A particular format for this third aspect where the associating moiety is a particle and 
^^SZ ^rr^l P"*'"^ '^'^ corresponding genes are co inmiSed r 
particles whereby recovery of an individual particle provides for identification of the 
mTH^r' T°^^ recombinant protein. This format particularly relis tf 
methods whereby genes encodmg the recombinant proteins are co-immobilised on the 
same particle as their corresponding proteins such that upon selection for ie 
recombmant protem, tiie corresponding gene will also be selected such tiiat the 
20 w fi^,? '1?'^ P™'^" determined (by sequencing tiie gene) or such 

SeST^crn ""'-^"'"^ ^'"^'^'^ THe method ofSe 

mventior^ mclude provisions to control the amount of proteins displayed on the particle 
commonly by controllmg the number of moieties on the particle tl which Sie 
recombmant proteins bind. The invention includes provisions to coniisplay other 
_ molecules on the particle in conjunction witii the recombinant protein iricluding otiier 
25 Protems or protem chams and including molecules to which the recombinanVjroteS 
bmd such as antigens. tiiuicms 

In tiie basic operation of tiie third aspect of tiie present invention, tiiere is provided an 

^Z°lT^I °' T-"^^ ^^''^ ^« syntiiesised recombin^t proteins 

using methods such as m vitro transcription and translation or phage display such 
protems being exemphfied by antibody variable regions (Fv's). Subsequently, genes 
and reconibmait protems are co-iimnobilised on particles, one or more Ugand^ L 
associated witii tiie gene eitiier as DNA or mRNA whereby such ligands become 

tZt^ f '"^^"T ^'^ ^^""^^ ''^^'^^ °' "^^^'^^y '""'^ ^ reacted wiUi 

fte pax^cle surface to produce a covalent or ionic attachment. Alternatively, the gene 
« d^ectly mmiobilised on tiie particle via formation of one or more covalen or ionic 
bonds to nataal DNA or RNA reactive groups. The resultant recombinant proteT 
encoded by the genes may have one or more ligands associated (such ligands being 
moieties on flie protems by which immobilisation can be achieved) such as protein 
sequence tags (encoded by tiie genes) or biotin groups (incorporated by in vitro 
franscnptaon and translation using biotinyl lysine) such tiiat fliey too can become 

tiZl f °" ^"^"^^ °' ^^^'^"y lig^ds are reacted witii 

Oie particle surface to produce a covalent or ionic attachment. The ligands on tiie 
genes and proteins can eitiier be the same or different ligands witii immobilisation on 
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the same or different receptors. For useful operation of this aspect of the invention, 
genes or pools of genes either as DNA. mRNA or within a live microorganism such as 
a phage, are disfributed mto arrays (or multiple reaction vessels etc) and recombinant 
proteins are produced m such arrays (for example by in vitro transcription and 
translation or by growtti of phage). Master arrays containing the genes can be used as 
the source of matenal for generating the recombinant proteins whereby samples of 
genes or proteins are dispensed into server arrays such that array locations for each 
gene or protem pool is preserved. Either before, during or after this process, one or 
more particles is mtroduced into each position in the array providing receptors to 
which genes and proteins can bind. On one variation of the invention, the genes are 
attached to the particles at the outset and proteins produced directly fiom these genes 
such aiat fliese recombmant proteins are subsequently immobiUsed onto the same 
partic e. Either before or following attachment of recombinant proteins to the 
particles, the proteins can be optionally subjected to modification for example 
phosphorylation by other Idnasfes or binding by other proteins. In a variation of the 
third aspect, the arrays include droplets such as oil-in-emulsion droplets or liposomes 
mto which genes or hve microorganisms are segregated (usuaUy by producing the 
droplets pnor to protein synthesis and thus arraying the genes witiiin droplets) 
Protems are produced within the droplets and these are then co-attached to the 
particles mcludmg the genes. In the case of droplets, the particle to which the genes 
and proteins co-attach can either be introduced into the droplet or the particle can be 
the droplet Itself. For example, in the case of Uposomes. the proteins could be 
produced with hpophilic tags which combine with the liposomes membranes 
especiaUy where this leads to "display" of the proteins on the outside surface of the 
25 liposome. A related example is where i« w7ro translation of mRNA is used where 
acrosomal membranes can be introduced in to the reaction whereby proteins with 
hpophihc tags can mtegrate into such membranes which can subsequently be 
dispersed into small particles. 
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If It IS desirable to then pool the particles for a selection process, particles are then 
reteieved from the arrays and mixed; the recombinant proteins on the particles are then 
subjected to selection, typically by exposure to a target which binds to selected 
proteins on the particles. Certain recombinant proteins could also be subjected to 
modification at this stage. Particles holding selected or modified proteins could then 
35 be retrieved by a variety ofmethods; for example, ifthe target is labelled with a 

fluorescent label, FACS could be used to separate out particles with (or without) the 
target. In the first major aspect of the present invention, genes encoding recombinant 
protems on such selected particles could then be recovered by, for example. PCR 
amplification of the co-immobilised DNA or mRNA. 

There are many types of "associating moieties" for linking proteins with their 
corresponding genes which could be used in the third aspect of the present invention 
Particles of use include latex and magnetic particles, and particles onto which 
synthetic oligonucleotides are synthesised directly. Such particles would commonly 
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be provided with a "receptor" to which the synthesised polypeptides can bind Other 
associating moieties may be single molecules or molecular complexes which can act 
as a bndge to jom the gene molecules to the synthesised proteins. For example both 
the gene molecules and synthesised proteins can include biotin groups which could 
5 then be cross-hnked by addition of streptavidin whereby streptavidin acts as the 

associating moiety. In a similar fashion, a sequence tag on the proteins and a ligand 
on the gene molecules can be cross-linked using, for example, a bispecific binding 
reagent such as a bispecific antibody (bmding to both sequence tag and ligand) or an 
antibody-streptavidin conjugate (whereby the antibody binds either to a Hgand on the 
1 0 protem or gene and the streptavidm bmds to biotin on the protein or gene, whichever is 
non-liganded). Other associating moieties may be bacteria or bacteriophage whereby 
the synthesised polypeptide binds to a specific hgand on the bacteria or bacteriophage 
For example, an M13 expression system can be used to produce a Fv fragment of a 
specific antibody in E.coli which can then bind to a specific protein antigen on the 

15 M13 Itself, especially where this is displayed on the phage head fused to a capsid 
protem. By testmg for M13 phage to which Fv has bound, the gene encoding the 
specific Fv can be determined by sequencing the Fv gene encoded by the M13 
Similarly, the M13 expression system can be used to produce a protein which binds to 
a specific protem displayed on the M13 itself. In every case, the unique feature of the 

20 third aspect is that the recombinant protein molecules become attached, after 

synthesis, to the conresponding genes via an associating moiety. Such attachment after 
synthesis especially allows for the unhindered synthesis of the protein molecules 
without, for example, the need to be synthesised as a fusion with other protein 
molecules which could alter the protein conformation or interfere with its recognition 

25 or function. ■ 

The present invention includes several methods to generate the recombinant proteins 
pnor to linkage to the associating moiety. These methods especially mclude protem 
synthesis by in vitro transcription and translation, and protein synthesis in bacteria 

30 du-ected by plasmids or phage. In the latter case, the present invention provides the 
advantage that the generated protein need not be fused with a phage protein as the 
generated protein in the present invention is subsequently immobihsed onto a separate 
particle. In contrast to current methods for phage display of proteins where such 
proteins are fused to a surface phage protein or protein which can reach the surface, 

35 the third aspect of the present invention would require either lysis of the phage or for 
secretion or leakage of the recombmant protein from the phage head in order to 
provide for its subsequent immobilisation onto the particle. Other z/z vivo methods of 
generating proteins such as expression in bacteria, yeast or even mammahan cells 
could thus also be used in the third aspect which therefore has the advantage of being 

40 more versatile than individual display methods. Thus, recombinant proteins could be 
modified by a particular host, for example glycosylated by mammalian cells, prior to 
immobihsation. One particularly usefiii aspect of the present invention is the ability to 
control the numbers of molecules of recombinant protein on the associating moiety, 
especially when this is a particle, by control of the number of "receptor" molecules on 
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the particle. In the case of antibody variable regions therefore, the valency of 
individual or pools of antibodies can be varied according to selection criteria. A 
further alternative associating moiety could be a live cell itself whereby the 
recombinant protein is linked to a ligand on or near the surface of the live cell such as 
a cell surface marker of the bacterium or mammalian cell harbouring the expression 
plasmid or v/hereby, upon secretion, the protein would then bind to the cell from 
which it was expressed. The protein could then be reacted with target and cells 
harbouring the expression cassette for the specific Fv binding the target could be 
isolated. 

The third aspect herein provides a particularly useful means for selection of 
recombinant proteins which bind to a target or for selection of recombinant proteins 
which are modified by a specific treatment, for example by treatment with ceU or 
tissue lysates. The method accordingly will prove especially useful for the molecular 
evolution of recombinant proteins whereby successive rounds of selection ensure 
recovery only of protfeiijs with stringent properties such as high affinity binding to a 
target. The method can also encompass successive rounds of mutagenesis of selected 
genes to maximise the diversity for evolutionary selection. It will be apparent to those 
skilled in the art that there are many variations which could be employed based on the 
third aspect of the present invention but falling within the scope of the present 
invention. For example, associating moieties especially particles used to capture the 
genes and recombinant proteins could themselves be bound by another polypeptide 
chain whereby, when protein-protein binding occurs, the recombinant protein is not 
captured by the particle directly but rather by the polypeptide chain already on the 
particle. An appropriate tag or ligand on the recombinant protein can then be used to 
provide a means for detecting the protein-protein binding event. In the same manner, 
particles could be bound by synthetic ohgonucleotides which are subsequently used to 
aimeal to the genes as a means to capture them on the particles. 

In a fourth aspect, the present invention provides A method of protein identification 
and/or sequencing comprising providing a library of individual proteins, one or more 
of which may bind to a target of interest, wherein each individual protein is attached to 
an individual "coding moiety*'. 



In this aspect of the present invention, recombinant proteins synthesised from a gene 
library are subsequently attached to "coding moieties" such as particles which are 
distinguishable through one or other coding methods in such a manner that the coding 
relates to the identity of the gene which encodes a recombinant protein attached to the 
particle. Where the recombinant proteins are immobilised on coded particles, the 
recombinant proteins may have one or more ligands associated such as protein 
sequence tags or biotin groups such that they can become bound to a "receptor" on the 
coded particle surface or whereby such ligands are reacted with the particle surface to 
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produce a covalent or ionic attachment. In the operation of this aspect of the present 
mvention, recombinant proteins or pools of proteins are synthesised or segregated in 
large arrays. Particles with unique codes are then introduced into each position in the 
array. Such codes include, for Example, different ratios of measurable signalling 
5 moieties such as fluorescent, chemiluminescent or radioactive labels or different 

physical features which distinguish particles such as different shapes or markings for 
example a code or unique mark etched into the particle. In each case, individual coded 
particles can be distinguished from each other. Particles of use in the present 
mvention includes any such particles, complexes or molecules with the property that 
10 protems can be attached. Following pooling of particles and binding of the mixtures 

of protems on coded particles to a specific target, the coding of selected particles could 

then be detenmned m order to determine their original array positions and hence the 
array loci of genes encoding the selected recombinant proteins. As a variation of these 
aspects of the invention, selected proteins on particles could be identified directly 
15 usmg methods such as MALDI-TOF (mass spectroscopy) or using labelled antibodies 
to identify known proteins. The operation and scope of this fourth aspect of the 
present invention will share many aspects and scope of the above third aspect of the 
invention. 



20 35. 



In a fifth aspect, the present invention provides A method for analysing 
mixtures of proteins comprising: 



(i) digestion or cleavage of the protein mixture; 

(ii) fractionation of the resultant peptides; and 

25 analysis of the resultant peptides by means of their mass and/or sequence. 

This aspect of present invention relates to methods for analysing mixtures of proteins. 
In particular, the invention relates to methods to compare proteins betweeii different 
cells and tissues. The invention involves the combination of digestion or cleavage of 

30 protein mixtures, fractionation of peptides using a library of protein binding reagents, 
and subsequent analysis of peptide fractions for mass or sequence. The invention 
includes optional physical fractionation of proteins or peptide ifragments additional to 
fractionation with protein binding reagents. Current methods to analyse en masse 
comp lex mixtures of proteins such as in mammalian cells or tissues require that the 

35 proteins are separated by technologies such as two dimensional (2D) gel 

electrophoresis. For this technology, cellular proteins are usually separated on the 
basis of charge in one dimension and on the basis of size in the other dimension. 
Proteins can either be identified with reference to the electrophoresis migration pattern 
of a known protein or by elution of the protein from the electrophoretically separated 

40 spot and analysis by methods such as mass spectrometry and nuclear magnetic 

resonance. However, limitations of the 2D protein gel method include the limited 
resolution and detection of proteins from a cell (typically only 5000 cellular proteins 
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are clearly detected), the limitation to identification of separated proteins (for example 
mass spectrometry usually requires lOOfinoles or more of protein for identification) ' 
the speciaHst nature of the technique and the difgculty in automating the technique in 
order to achieve very high protein analysis throughputs. There is thus a need for 
5 supenor methods to analyse complex mixtures of proteins en masse especially using 
methods without gel electrophoresis and methods which are easy to automate. 

The core of the fifth aspect is that proteins are either digested or cleaved into smaller 
peptide fragments and tlien fractionated using a library of protein affinity reagents and 
then subjected to mass analysis especially by mass spectroscopy. Optionally, proteins 
or peptide fragments may be fractionated physicaUy in addition to being fractionated 
with protem affimty reagents and may also be conjugated with one or more "chemical 
tags" to assist in finctionation. 

15 The major aspect of the fifth aspect provides for cleavage of proteins using proteases 
or chemical methods; firactionation of the peptide mixture thereby produced and 
subsequent mass analysis. Fractionation of peptides is achieved using protein affinity 
reagents, especially libraries of recombinant antibody fragments. Optionally, the 
method includes additional fractionation of proteins or peptides using physical 
methods or specific affinity reagents such as antibodies or sohd phases or reactive 
chemical groups to isolate peptides or mixtures of peptides for subsequent mass 
analysis. Protein affinity reagents are used to retrieve individual peptides or sets of 
peptides from the peptide mixture for subsequent mass analysis. Alternatively or 
additionally, protein affinity reagents can be used to eliminate peptides from the 
mixture whereby the mixture is itself subsequently subjected to mass analysis. The 
protein affinity reagents can either bind by virtue of specific sequences or structures in 
peptides or by virtue of specific chemical groups either as natural constituents of the 
peptides or as chemical tags which are added to the peptides either before or after 
cleavage. 



20 



25 



30 



For analysis of larger mixtures of peptides, panels of protein affinity reagents such as 
those provided by recombinant hbraries of antibody Fv fragments (including single- 
chain Fv's) can be used in order to isolate subsets of peptides for subsequent analysis. 
Such panels of Fv's will include a wide range of peptide specificities which could be 
35 achieved, for example, by pre-absorbmg antibody libraries on the peptide samples of 
interest or by immunising animals with peptide samples of interest and generating 
recombinant Fv libraries from the animal B ceUs. Alternatively, polyclonal antisera or 
panels of monoclonal antibodies could be prepared from immunised animals and used 
to fractionate peptides. Then individual or mixtures of the selected antibodies are used 
to isolate (or eliminate) the specific subsets of peptides from a test sample. 
Subsequent mass analysis of a range of peptides can facilitate the detection of 
differences in specific proteins between test samples. 
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Generation Of recombinant Fv's or antibodies to all peptides in a mixture is difficult 
and IS highly dependant on the number of peptides in a mixture and the facility for 
individual peptides to be bound with reasonable aflRnity to antibodies ("antigenicity") 
With a very large peptide mixture, a limitation is redundancy whereby antibodies witii 
the sane peptide specificities are repeatedly represented whilst antibodies to other 
peptide specificities are underrepresented or absent This may cause a particular 
protem not to be m^s analysed if none of the peptides from a particular protein are 
bound by an antibody. Therefore, a particularly useful method is to isolate N or C 
teimmal peptides (or both) from a protein by preabsoiption of the protein to a solid 
phase via its N and/or C tenninus prior to cleavage or by chemical tagging of the N 
and/or C teimmus for subsequent isolation after cleavage. In principle, this then 
should lead to recovery of all N and/or C terminus peptides representing aU proteins 
from the^ple Such isolation of N and/or C temiinal peptides is greatly facilitated 
by the differential reactive nature of the N tenninal amino group and the C terminal 
carboxyl group m the protein compared to internal amino and carboxyl groups Such 
isolated N and/or C temiinal peptides can be fiirther fractionated using otiier affinity 
reagents which either recognise specific peptide sequences or which recognise 
chemica^ tags on the peptides or finther fi^tionated by physical means such as HPLC 
Such isolated N and/or C terminal peptides are then fi^onated using protein affinity' 
20 reag^ts pnor to mass analysis. The invention also allows for sequential conjugation 
of different chemical tags to the protein / peptide mixtiire especially where N or C 
texmrni are sequentially exposed by specific cleavage of the protein / peptide and 
whereby tiie N or C termini (or both) are conjugated with a specific chemical tag upon 
exposure of that termini. This aspect of the invention therefore provides for a series of 
protem fictions with a range of conjugated chemical tags introduced at the teimini 
such fractions bemg isolated using an affinity reagent which binds to the tag As a 
particularly useful method as an alternative to a chemical tag at the teiminus of the 
protem molecule, chemical tags can also specifically be attached to non-terminus 
amino acids such that internal peptides can be isolated via an internal chemical tag 
Umque chemistiies are available for attachment of ligands to several specific amino 
acids, for example to the s -amino groups of lysines, the thiol groups of cysteines and 
the carboxyl groups of aspartic and glutamic acids. One advantage of isolating 
peptides by vutue of non-teiminal tags is that selection can be made for larger peptides 
which are more likely to contain a specific amino acid to which a tag is attached tiius 
isolating peptides with a mass which exceeds low molecular weight masses with a 
larger background noise during mass analysis. Another advantage is the array of 
reagents already available to introduce chemical tags onto specific amino acids within 
proteins or peptides especially reagents which provide a biotin tag. 

40 Another embodiment of the fifth aspect provides for sequential cycles of protein 

cleavage using proteases or chemical methods with fractionation with protein affinity 
reagents either during or following successive protein cleavage steps and subsequent 
mass analysis. In this case, the analysis of protein mixtures is assisted by sequential 
cleavage cycles whereby tiie spectinm of proteins and peptides are fractionated with 
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the protein affinity reagents and analysed following each cleavage cycle. This method 
could also include chemical tagging cycles between cleavage cycles to increase the 
mass or steps to remove side-groups such as carbohydrate groups in order to reduce 
mass. If the mass of the range of protein fragments is then determined at the end of 
each cleavage cycle (either widi or without chemical tagging, cleavage or other 
modification), then a range of mass distributions will be obtained for each cycle. With 
an appropriate series of mass modification cycles, the result for a single protein or a 
mixture will be a mass spectrum of protein/peptidc fragments which is altered at 
successive cycles; the pattern of these alterations will provide a "fingerprint" for the 
specific proteins/peptides in the mixture. The appearance and disappearance of a 
particular protein/peptide fragment of a certain mass following a specific cleavage 
cycles with or without chemical tagging, cleavage or other modifications will provide 
a fingerprint for identification of the fragment sequence especially by reference to a 
database of such fingeiprints. Comparison of the spectrum of protein/peptide 
1 5 fragments from different related samples then allows for the identification of 

protein/peptide fragment differences between these samples. Particularly useful in this 
aspect of the present invention is proteases which specifically recognise two amino 
acids and cleave the protein as a result. An example of such proteases are the 
prohormone convertases which cleave between dibasic amino acid pairs. Therefore, 
20 the fifth aspect of the present invention provides for novel ways of analysing protein 
mixtures using a combination of protein digestion or cleavage, fractionation using 
protein affinity reagents and mass analysis. 

In a related aspect of the fifth aspect, proteins are fi^ctionated prior to cleavage. For 

25 large protein mixtures, particularly those isolated directly from whole cells or tissues, 
the pre-fractionation of proteins may be desirable in order to reduce the complexity of 
mixtures subjected to subsequent cleavage, peptide fractionation and mass analysis. 
Whilst protein affinity reagents which bind sequences or structures in the 
proteins/peptides directly are primarily useful, an alternative or an addition is to use a 

30 library of chemical tags to provide moieties bound by a set of protein affinity reagents. 
More conventional means of pre-fi^tionation include the use of gel electrophoresis 
either in one or two dimensions where sections of the gel are isolated and the proteins 
within then subjected to cleavage and mass analysis. Other pre-firactionation methods 
include isolation of proteins by virtue of natural modifications such as 

35 phosphorylation, glycosylation. protein-protein (or peptide) interaction; alternatively, 
membrane proteins can be pre-fi^ctionated or proteins from particular compartments 
within the cell. Another important pre-fractionation procedure is to remove highly 
abundant proteins from the mixture using afOnity reagents such as antibodies to bind 
and remove such proteins. As an alternative to pre-fractionation, peptides generated 

40 after cleavage can also be fractionated by many of these means and also including 

size/charge fractionation methods using HPLC. Such methods are particularly useful 
to fractionate peptides which have already been selected from a mixture through the 
application of protein affinity reagents. In particular, HPLC can be interfaced with 
mass analysis such that peptide fractions from HPLC separation are directly subjected 
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to mass analysis. Peptides generated after cleavage can also be fractionated by virtue 
of natural modifications using, for example, antibodies which bind phosphorylated 
ammo acids within peptides. Prefiractionation of proteins may also be achieved by 
usmg protem affimty reagents such as monoclonal/polyclonal antibodies to isolate 
5 specific proteins for subsequent cleavage and mass analysis. For such analysis of 
larger mixtures of proteins, libraries of antibodies such as those provided by 
recombmant libraries of Fv's are preferred in order to isolate subsets of proteins or 
subsets of cleaved peptides for subsequent mass analysis. Such library of antibodies 
will include a wide range of protein or peptide specificities but can also be pre- 

10 inched for bmdmg to proteins/peptides of interest in the particular sample of interest 
For peptides, this is preferably achieved by testing individual Fv's for selective 
bmding to a single or a small number of peptides in the sample. Alternatively pre- 
ennchment can be achieved by pre-absorbing antibody libraries on the mixed ' 
protein/peptide sample of interest and then using individual or mixtures of the selected 

1 5 antibodies m order to isolate subsets of proteins or peptides. Fractionation with 

protem affinity regents provides mass spectra for a range of different protein/peptide 
fictions tiius facihtating detection of differences in specific proteins between 
samples. 

20 A fimher advantage of the use of chemical tags is that the subsequent fi^ctionation of 
peptides by affinity reagents can greatly reduce the number of selected peptides from a 
protem molecule with die rest of die molecule thus being eliminated fiom the mass 
analysis. An especially convenient method for selective chemical tagoing is to tag 
either (or both of) the N and C terminus of the protein molecules in the mixture and 

25 then to digest or cleave the protein molecules with a reasonably selective reagent such 
as a ammo acid or sequence-specific protease (such as endopeptidase Arg-C) or 
cleavage reagent (such as acid pH to cleave at Asp-Pro). Using an affinity reagent, N 
or C tennmal peptides (or both) from the original protein could then be isolated and aU 
mtemal peptides discarded. This reduction in complexity is then sufficient for mass 

30 analysis especially using HPLC coupled to a tandem mass spectrometer to analyse the 
peptides en masse m order to identify the individual peptides from the mixture. 

Alternatively, chemical tagging could be performed only after digestion/cleavage, for 
example with the dibasic cutters, the prohoraione convertases. This would provide for 
taggmg only at one or more internal sites of the original proteins. If the protein 
mixture is then subjected to a second digestion/cleavage step with a different enzyme 
or cleavmg reagent, tiien the size of die tagged peptides would be reduced where a 
cleavage site was present in the original protein. The tagged peptides could then be 
fractionated usmg protein affinity reagents and subjected to mass analysis. 

In another embodiment of the fifth aspec, a protein mixture is subjected to cycles of 
taggmg, digestion/cleavage and mass analysis, whereby fractionation by protein 
affinity reagents and mass analysis is perfonned only on an aliquot of the mixture 
resultant from use of an affinity reagent binding to the specific chemical tag and 
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whereby the master mixture is then subjected to tagging with a difTerent chemical tag 
and digestion/cleavage. This provides sequentiaUy a range of different fragments. 
Another variation on the method involves the same initial steps as above but, having 
exposed new N and C termini after cleavage, one (or both) of these new temlini can 
5 then optionally be tagged v ^nt h a different chemical which thu n tn g^iiiit^m^i c^t^g jp the 
original protein. If required, the process could be repeated one or more times with a 
different protease or cleavage reagent, each time with the addition to the N or C 
tenninus of a different chemical tag. In one format of the method, the whole mixture 
of proteins would first be tagged with two different chemical groups at each of the N 
10 and C terminus and then cleaved with a protease, such as one which specificaUy cuts 
adjacent to a specific amino acid, and tagged again at the new N and C termini with 
two further different chemical groups. This would result in a mixture of peptides each 
with chemical tags at the temiini. As the N and C terminal peptides would have a 
specific tag, these could then be isolated from the mixture using ^propriate affinity 
1 5 reagents. Internal peptides without either the initial N or C terminal tags could be 

isolated using their specific tags. The process of digestion and tagging could then be 
repeated to create fiirther peptides with tags. Using specific combinations of affinity 
reagents for specific tags, N or C terminal or specific internal peptides from the 
original protein could then be isolated and selected peptides discarded to achieve a 
20 reduction in complexity. Where chemical tags are added to two or more amino acid 
side groups within peptides, sequential use of afSnity tags could isolate Sections of 
peptides containing specific combinations of amino acids. For example, if a mixture 
of peptides of average length of 20 amino acids and separately tagged at lysine and 
phenylalanine and the mixture comprises 25% of peptides which include neither lysine 
25 or phenylalanine, 25% with lysine only, 25% with phenylalanine and 25% with both, 
then the separate or sequential use of specific afi&nity reagents either for lysine or 
phenylalanine wiD result in firactionation of peptides into four equal fictions. In 
practice, such a fi^ctionation scheme will favour the binding of larger peptides to 
afBnity reagents as these peptides are more likely to contain one or more of the 
specific amino acids tagged. This will bias against the very small peptides such as 
those with molecular weights less than 1000 daltons which, when subjected to mass 
spectrometry analysis, will be more hkely to coincide with background noise due to 
fragmented peptides and other small molecules. 

35 Where analysis of complex protein mixtures is required such as in mammalian cells or 
tissues, the present invention provides a main method where proteins are fractionated 
using protein affinity reagents either before or after cleavage and the peptides are then 
mass analysed The fractionation of a complex mixture of proteins or peptides 
requires a correspondingly complex mixture of protein affinity reagents and can be 
assisted by one or more additional affinity reagents which can recognise features of 
the proteins/peptides which are the basis for fractionation. Where cleavage is 
conducted prior to fractionation, the most common method used in the present 
invention is to cleave the whole protein nruxture with a protease such as trypsin or V8 
(Glu-C) protease and to then selectively isolate and mass analyse certain peptides. 
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Commonly, N or G tenmnal peptides (or both) from the peptide mixture are isolated 
typically by adding a chemical tag to the N and/or C terminus of the proteins prior to 
cleavage and usmg an affinity reagent which isolates peptides with the chemical tag 
Alternatively, specific peptides (N / C terminal or otherwise) can be isolated using 
affimty reagents which have been selected for binding to specific peptides withm 
specific proteins; these wiU then selefct out those peptides from the mixture For more 
complex mixtures of proteins, a further fractionation step such as HPLC fi^tionation 
based on size, charge or hydrophobicity is preferred prior to mass analysis especially 
as this can be mterfaced with mass analysis. Selective isolation of peptides then 
allows for comparative analysis of specific peptides derived from alternative protein 
mixtures for theu- relative quantities (relating to relative levels of the proteins in their 
respective mixtures) and, in certain cases, for modifications of the peptides. 

For fractionation of N or C terminal peptides, the preparation and use of protein 
affinity reagents is an important aspect of the present mvention and the labelling of the 
N or C termmus of proteins is another important aspect. With a typical mixture of 
protems from mammalian cells or tissues or from many Hving organisms, several of 
the N termirii of these proteins (and some C termini) will be modified (for example by 
methylation) such that addition of a chemical tag to the terminus may be blocked In 
addiUon, a typical mixture of proteins from mammahan cells or tissues or from many 
hvmg orgamsms, the proteins will occur at different relative levels of abundance 
mcluding, commonly, certainly highly abundant proteins. Where protein mixtures 
from mammahan cells or tissues or from other living organisms are used for the initial 
selection of protem affinity reagents, such highly abundant proteins may dominate 
selection of affinity reagents and may be predominant in the final peptide mixture for 
mass analysis. A solution to both of these problems is to use an artificial source of 
mixed proteins to isolate the affinity reagents. Typically, this will be a gene 
expression system whereby a gene (usually cDNA) library is used to generate the 
protems without N or C terminal modifications. In addition, the use of a gene 
30 expression system allows the gene hbrary to be "normahsed" to reduce or remove 

highly abundant genes within the library. This is typically achieved by self-annealing 
of the DNA (or RNA) prior to constructing the library. Therefore, a common method 
m the present mvention js to generate proteins by expression of gene hbraries (usually 
normalised) resulting in proteins free from significant N or C terminal modifications 
and, where normahsed, resulting in a protein mixture free from domination by specific 
protems, A typical expression system used with gene libraries is in vitro transcription 
and translation usmg a eukaryotic ribosome preparation; this also provides the 
possibility of incorporating modified amino acids into the expressed proteins The 
expressed protein mixture can then be used directly for N or C terminal labelling 
OOier expression systems could also be used where N terminal amino groups or C 
terminal carboxyl groups are not modified or prevented from subsequent chemical 
tagging. Where modification occurs, in some cases the N terminal modification can 
be removed eitiier using enzymes such as histone deacetylase or chemical methods 
such as limited cyanogen bromide cleavage to remove N terminal methionines. 
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Having produced a mixture of proteins free from N/G terminal modification, chemical 
tags can then be added to the N/C terminal amino group(s). For the N terminus, the e- 
amino group of lysines can be initially blocked using reagents such as citraconic 
anhydride or methyl acetimidate to then allow only the N terminal amino groups to 
5 react. Alternatively, the e -amino group of lysines can be blocked by incorporating 
modified lysines into the expression system such as in vitro transciption / translation 
whereby, for example, biotin-modified lysines can be directly incorporated instead of 
lysines. Chemical tags can then be added selectively to the N terminus of proteins, for 
example using isothiocyanates of specific molecules to which an afOnity reagent is 

10 available. One such example is fluorescein which is incorporated by reaction of the 
proteins with fluorescein isothiocyanate allowing subsequent purification with anti- 
fluorescein antibodies. Alternatively, polycarboxyhc chelating agents can be 
incorporated as isothiocyantes allowing subsequent purification with specific metals. 
Once the N and/or C termini of proteins in the mixture are tagged, the protein is then 

1 5 comprehensively and specifically cleaved either chemically or en2ymatically, using 
proteases such as trypsin or another cleaving agent. Such cleavage thereby releases 
from each protein an individual tagged terminal peptide firagment, such collection of 
fragments which can then be purified from the mixture of untagged peptides using an 
appropriate affinity reagent such as an antibody specific for the chemical tag. If 

20 required, the size of the chemical tag can be increased in order to produce a larger 

mass for analysis; this would be useful for peptide firagments resulting from cleavage 
very close to the chemical tag whereby the resultant fragment might be so small as to 
be mass analysed within lower molecular weight "noise". The chemical tag niight, for 
example, comprise a piece of nucleic acid attached to the peptide via a reactive group 

25 introduced during synthesis of the nucleic acid. Such a nucleic acid molecule might 
also be usefixl for isolation of the tagged peptide via armealing of the nucleic acid to a 
complimentary sequence. 

Following chemical tagging and isolation, the recovered mixture of N/C terminal 
30 peptides are then used as a "bait" for the isolation of protein affinity reagents to bind to 
these same peptides fixtm proteins derived directly from mammalian cells or tissues or 
from other living organisms. Such affinity reagents will typically derive from a Ubrary 
of recombinant Fv*s displayed as part of a particle containing' the corresponding gene 
encoding the antibody. Examples of such particles are ribosome display particles or 
35 phage display particles, in each' case where the genes from selected antibodies can be 
rescued in order to propagate those specific antibodies. As an alternative, large arrays 
of antibodies (such as recombinant single chain or Fabs, Fvs) can be screened using 
the N/C terminal peptide mixture and antibodies which display binding to the peptides 
can be recovered via the corresponding genes. As another alternative, N and/or C 
40 terminal peptides could be used to directly generate polyclonal or monoclonal 

antibodies by appropriate immunisation of an animal. By these means, a hbrary of 
protein affinity reagents is selected which can then be used for the analysis of mixtures 
of proteins such as from mammalian cells or tissues or from other living organisms. 
Such analysis can either involve using the library of affinity reagents to select out N/C 
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teraimal peptides from proteins derived from mammalian cells or tissues or from other 
hvmg orgamsms or usmg individual affinity reagents to select out individual peptides 
The selected peptides can then be mass analysed typically by MALDI-ToF (matrix- 
assisted laser desoiption/ionisation time-of-flight) where the individual peptides give 
5 mdmdual charge:mass ratios which can then be used to identify the peptide amino 
acid constituents. MS-MS (double mass spectroscopy) peptide sequencing can 
subsequently be used to identify the peptide ifit can be isolated. Alternatively the 
new generation of Quadrupole-ToF LC-MS-MS ("Q-ToF") instruments can provide 
for sequential MALDI-ToF and MS-MS within the same instrument. Indeed, protein 
1 0 af&uty reagents either individualfy or in mixtures can be immobilised either indirectly 
or directly onto the desoiption chip inserted into the MALDI-ToF instrument and 
peptides can be subsequenUy bound via the affinity reagents on the chip In this way 
multiple peptide fractions adsorijed by multiple affinity reagents at different loci can ' 
be analysed on a single chip. The use of recombinant proteins as the "bait" to isolate 
protem afEmfy reagents also provides the prospect of attachmg other tags to those 
protems whereby the tags are encoded by the gene sequence; for example a C 
teimmal polyhistidine tag (allowing subsequent purification of the tagged' fiBgments 
usmg mckel chelates) could be incorporated, for example through PCR-mediated 
mcorporation into the gene sequences 

20 

The use of recombinant proteins as the "bait" to isolate protein afiSnity reagents also 
pro>^des another common method of the fifth aspect of the present invention for 
specifically isolatmg peptides using tags encoded by the recombinant proteins. Such 
tags can be convemently incorporated into members of the a gene (usually cDNA) 

25 library durmg its construction or into individual clones or groups of clones thereof 
usmg specific PCR primers encoding such tags and designed to incorporate such tags 
into the resultant expressed proteins. Preferably, such tags will be incorporated into 
the expressed protems in all reading frames in order to produce a productively tagged 
protem. Such tags will preferably be incorporated via the downstream primer of a 

30 PCR reaction with the usual result that the tag is produced towards the C terminal end 
of the expressed protein (although upstream termination codons may prevent this in 
some clones). However, tags may also be incorporated at the N teiminal end or in 
both N and C termini. 

35 For tiie isolation of specific peptides from a peptide mixture, the peptide sequences 
can be produced synthetically (or via recombinant DNA) and then, as above, used as 
the "bait" to capture specific protein affinity reagents. These affinity reagents can then 
be used to isolate these same peptides from a cleaved protein mixture derived from, 
for example, mammalian cells or tissues or fix>m other living organisms. 

As an alternative to selectively fi^ctionating N or C terminal peptides or specific 
mtemal peptides, modified peptides such as peptides including phosphorylated amino 
acids which can be isolated using antibodies which selectively bind to phosphorylated 
ammo acids (tyrosme, threonine or serine or combinations thereof) or using 
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inunobilised Fe3+ to trap negatively charged peptides. Similarly, peptides modified 
by glycosylation and other modifications can be isolated, in some cases where the 
peptide modification is further derivatised in order to facihtate isolation. For example 
carbohydrates can readily be modified via periodate reactions as an intermediate to ' 
adding chemical tags such as fluorescein. A particularly important aspect of the 
mvention is the firactionation of selectively modified peptides whereby such peptides 
are selectively tagged by virtue of their differential exposure to tagging within die 
ongjnal protem environment prior to cleavage. For example, surface exposed proteins 
on hvmg cells can be selectively tagged, for example with biotin, by treating the ceils 
with a tagging agent which preferentially reacts with specific amino acid groups An 
mdirect method for achieving such tagging in proteins which are naturally tagged via 
other stimuli withm cells is to apply such stimuli in order to effect tagging of the 
protems. For example, receptor-associated tyrosine kinase molecules within ceUs can 
potentially be tagged (for example, phosphorylated) by addition of the receptor ligand 
1 5 to those cells. Following modification, peptides are released firom proteins by 

cleavage and then directly mass analysed or subjected to fiuctionation with protein 
afBnity reagents as above prior to mass analysis. 

Mass analysis of proteins and peptides by the present invention is preferably 
20 performed using mass spectroscopy. In particular, MALDI-ToF analysis has the 
capability to very accurately measure specific mass: charge ratios for individual 
peptides. This method has the capability for simultaneous analysis if thousands of 
peptides. Above 4kD, the resolution of individual peptides (and proteins) becomes 
poorer such that cleavage of proteins into peptide fiagments is necessary in order to 
25 provide fine resolution. Recent metiiods of interfacing Uquid chromatography 
separation methods (such as HPLC) with tandem mass spectroscopy has ab^eady 
permitted the mass spectrum analysis of protein mixtures comprising up to 200 
proteins. As such proteins are analysed foUowing protease digestion, if an average ten 
peptides per protein is assumed, then tiie method can analyse up to 2000 peptides. 
30 Using methods of the present invention whereby, for example, only tagged N terminal 
peptides are analysed, then up to 2000 N terminal peptides derived from up to 2000 
proteins could be analysed at any one time. As this is not sensitive enough for an en 
masse analysis of mammalian proteins fi-om ceUs (typically 50,000 per cell), then 
peptides have to be segregated into at least 25 Jfribtions in order for tiiese fi-actions all 
35 to be analysed. Such further fi^ctionation can be achieved eitiier directiy using a pre- 
selected Ubraiy of protein affinity reagents, or by the use of reagents to label intemal 
ends after successive protein digestion/cleavage steps following which specific protein 
affimty reagents are used to fi-actionate peptides according to their tags. As an 
alternative to standard mass spectroscopy, MALDI-ToF can be used to produce 
protein mass profiles which can be compared for protein mixtures from different cells. 

Chemical tags are typically moieties which can be covalently attached to proteins 
usually at the N or C terminus. For chemical tagging of the N terminus, this is 
commonly undertaken at the terminal amine group. If it is necessary to avoid tagging 
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of the e-amino group of lysines, then these can be initially blocked using reagents 
such as citraconic anhydride or methyl acetimidate. Terminal amine groups are then 
reactive with a wide range of chemical reagents especially using isothiocyanates. 
Thereby, common antibody-recognised ligands such as dinitrophenol and fluorescein 
5 can then attach these to the N terminus for subsequent fractionation using an antibody 
affinity reagent. For example, the commonly used Edman reagent phenyl 
isothiocyanate can be used to specifically attach to the N tenninus of proteins and can 
be derivatised if necessary with a moiety provided for subsequent binding to an 
affinity reagent For chemical tagging of the C tenninus, methods based on 
10 carbodiimide activation are commonly used to introduce ligands which are bound by 
affinity reagents. Alternatively, addition of moieties to the C terminus of proteins has 
been described using reverse proteolysis whereby certain proteases such as 
carboxypeptidase Y and lysyl endopeptidase can work in reverse to add chemical tags, 
commonly by way of amino acids either as derivatised amino acids with tags for 
binding to an affinity reagent or by way of natural sequences of amino acids which can 
then be specifically bound by an affinity reagent. It will be recognised that a wide 
range of internal amino acids can also be chemically tagged including Lys via the e- 
amino group, Glu / Asp via the carboxyl group, Cys via the thiol group, Ser / Thr via 
the hydroxyl group and Tyr via the hydroxyphenyl group. Specific derivatisations of 
most other amino acids have been described. It will also be recognised that post- 
translation protein modifications can be used for addition of chemical tags especially 
with glycosylation where the sugar residues are commonly oxidised by periodate to 
formaldehyde groups which can then react with amine-containing molecules. Other 
modifications which can be used to add chemical tags include lipidation, 
25 phosphorylation and metal ion addition. It will be recognised that there are a large 
number of methods in the art for introducing one or more chemical tags at specific 
sites vidthin protein molecules or peptides. 

Protein affinity reagents for use in the fifth aspect are commonly monoclonal 
30 antibodies. For specific sequences or structures within proteins or peptides, a library 
of recombinant antibody binding sites usually in the form of Fab's, Fvs or single-chain 
Fv's is used where commonly the antibody binding sites are "displayed" using, for 
example, bacteriophage or ribosorae complexes such that the gene encoding individual 
antibody binding sites can be recovered. For use in the present invention, hbraries of 
35 antibody binding sites can be dispersed into groups, for example by picking and 

arraying phage plaques or picking and arraying genes in vectors for ribosome display. 
Such pools will usually contain antibody binding sites for several proteins or peptides 
such that the pools can be used for fi^ctionation. Alternatively, the protein or peptide 
mixture to which libraries of antibody affinity reagents are required can be 
40 inmiobilised and used as the target for the pre-selection of suitable affinity reagents 

which are then dispersed into pools or used as individual reagents. For chemical tags, 
individual monoclonal antibodies are used to specifically bind to individual tags in 
order to achieve subsequent fractionation. 
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The fifth aspect of the present invention includes the use of protein affinity reagents 
other than monoclonal antibodies where such reagents can facilitate the fiactionation 
of peptides or protems prior to mass analysis. Such affinity reagents would include 
molecules of the inmiune which selectively bind certain peptides such as major 
5 histocompatafaility proteins and T cell receptors. Other protein affinity reagents would 
include protein domains commonly involved in protein-protein binding interactions 
such as SHI domains. Included in the present invention is the concept of cyclising 
peptides including within mixtures and especially when bound to solid phases by, for 
example, linking cysteine residues under reducing conditions. One method for this 
10 would be to add an additional cysteine residue at an exposed N or C terminal on 
irranobiUsed peptides using, for example for C terminal immobilised peptides, 
standard conditions of peptide synthesis or using reverse proteolysis whereby certain 
proteases such as carboxypeptidase Y and lysyl endopeptidase. Included in the fifth 
aspect is also a method for fiirther fractionating proteins or peptides by adding, usually 
15 at the N terminus, amino acids which form part of the recognition sequence of a 

protease which specifically cleaves at a recognition sequence of two or amino acids 
whereby one or more terminal amino acids in the protease recognition site is provided 
by the starting protein or peptide. In this manner, only a fiaction of the proteins or 
peptides to which the new amino acids are added will be then subject to terminal 
protease cleavage by virtue of the newly created sequence. In this manner, proteins or 
peptides can be tagged with additional amino acids usually at the N terminus creating, 
in a fraction of the thus tagged mixture, a specific protease cleavage site. The proteins 
or peptides can then, for example, be immobilised via the new terminus for example 
using a tagged terminal amino acid or by adding a chemical tag to the terminus, 
25 whereby an affinity reagent is then used to immobilise the tagged moieties. After 

removing non-immobilised untagged molecules, the proteins or peptides can then be 
subjected to cleavage with the specific protease which will then only cleave where the 
cleavage site has been generated by a combination of synthesis-derived amino acids 
and the original protein or peptide-derived amino acids. The cleaved peptides can then 
be fractionated using protein affinity reagents and mass analysed (or further processed 
prior to mass analysis) thus representing a subset ofthe peptide mixture. By using 
parallel synthesis of specific amino acids to exposed termini followed by 
immobilisation and cleavage, large mixtures of proteins or peptides can be fractionated 
on the basis of their terminal amino acid(s). An example of a protease recognition site 
35 is ile, glu, gly, arg which is cleaved between gly and arg by Factor Xa. The sequence 
ile, glu, gly could be synthesised onto the N terminus of a protein or peptide and thus 
if the adjacent amino acid in the protein or peptide sequence were arg, tiie cleavage 
site would be created and could be cleaved by Factor Xa. Other examples of protease 
cleavage sites are asp, asp, asp, asp, lys, cleaved by Enterokinase between asp and lys; 
40 pro, gly, ala, ala, his, tyr cleaved between his and tyr by genease I; leu, val, pro, arg, 
gly, ser cleaved between arg and gly by thrombin. N terminal addition of partial 
sequence asp, asp, asp, asp could be used to identify proteins or peptides with N 
terminal lys (cleaved by enterokinase), pro, gly, ala, ala, his to identify 
proteins/peptides with N terminal tyr (cleaved by genease), leu, val, pro, arg to 
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identify N teminal gly, ser; or leu, val, pro, arg, gly to identify N tenninal ser (cleaved 
by thrombin). Other proteases such as the MMP's (matrix metalloproteinases) with 
specific recognition sites could be used to fi^tionate proteins with other N terminal 
ammo acids. Different protease recognition sites could thus be used in combination 
5 witii the proteases to fractionate proteins or peptides according to the N terminal 
ammo acid. As an altemative, one or more amino acids are added to the free N 
temiinus of a peptide could be used to create a site for binding by an affinity reagent 
mcludmg where such a site is dependant on one or more the N terminal amino acids 
from the peptide. Thus, different peptide or groups of peptides could be distinguished 
10 by the addition of ammo acids to the N teraiinus which creates, in a manner dependant 
on the N termmal amino acids, a site for protease digestion or a site for binding by an 
afBmty reagent Where proteins are used as the starting material especially from 
maramahan ceUs whereby the N terminal protein is methionine, this can be removed if 
required by, for example, formylation and cleavage by a bacterial protease specific for 
removal of termmal formylmethionine. 
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Protem afBmty reagents are an important aspect of the fifth aspect of the present 
mvention and can be used for both broad fractionation of groups of proteins/peptides 
or for specific fractionation of individual proteins/peptides. For fi^tionation, it is first 
necessary to prepare fractions of or individual protein afSnity reagents which binds to 
a specific firaction or specific peptide and not to other fi^ctions/peptides. A convenient 
method is to firactionate the proteins or peptides prior to isolation of the protein affinity 
reagents. In the case of antibodies as the protein afSnity reagents, such 
proteins/peptides can then be used either to bind displayed antibodies from a hbrary or 
25 can be used to unmunise animals for generation of antisera. Where a hbrary of 
recombinant antibody binding sites such as single-chain Fv*s is used, gene clones 
encoding these can be retrieved after binding to protein/peptide fractions providing a 
rephcable source of the affinity reagents for subsequent isolation of the specific 
protein/peptide firaction. Individual single-chain Fv's may, in parallel, be screened for 
30 bmding specificity, for example by analysing peptide binding by MALDI-ToF. In this 
case, single-chain Fv's which bind to a single peptide from a large protein mixture are 
retained (m practice, those binding up to three peptides are also retamed) as gene 
clones for subsequent individual use or use within a mixture of Fv's for isolation of a 
protein/peptide firaction from the mixture. It will be appreciated that free N termini 
35 from proteins are oflen good targets for isolation of very specific antibodies and 

therefore capture and release of N terminal peptides from a protein will particularly 
favour subsequent antibody isolation. Certain Fv's may be usefiil for the elimination 
of abundant proteins or peptides fiom the mixture. It will be appreciated that retention 
and characterisation of the binding of single-chain Fv's may also provide a means to 
40 reduce redundancy by ehminating Fv's with the same specificity as other Fv's. 

The various embodiments of the fifth aspect of the present invention cover 
combinations of protein digestion/cleavage, fractionation with protein affinity reagents 
and mass analysis with an optional step of fractionation using affinity tags for specific 
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sequences or structures in the proteins or peptides, and an optional step of chemical 
tagging with fractionation by virtue of these tags. The different aspects encompass 
different sequences of these steps as follows; 

5 i - repeated digestion/cleavage cycles and mass analysis 

2 - digestion/cleavage, fractionation with protein affinity reagents, mass analysis 

3 - fractionation with protein affinity reagents, digestion/cleavage, mass analysis 

4- terminal chemical tagging, digestion/cleavage, fractionation with affinity reagents, 
mass analysis 

10 5 - as 3 but with additional cycle(s) of tagging, digestion/cleavage, fractionation 
5 - as 4 but with repeated tagging, digestion/cleavage cycles and mass analysis 

The fifth aspect of the present mvention should be considered to encompass these and 
related protein/peptide processing steps with the core objective of reducing the 
15 complexity of protein mixtures in order to achieve mass analysis of the resultant 
protein/peptide fractions. 

The currently common method for operation of the invention involves tagging the N 
and/or C terminus of a mixture of proteins (either natural or encoded by cDNA 

20 libraries), cleaving with a protease, immobilising the N and/or C temiinal peptide 

fragments, and releasing and subjecting the peptides to mass analysis. Alternatively, 
the N or C termini may be modified by addition of amino acids prior to cleavage with 
a sequence-specific protease. Prior to mass analysis, the peptides are used to bind 
protein affinity reagents such as antibodies whereby these antibodies have been pre- 

25 selected to fractionate the peptides or are themselves retained as affinity reagents. The 
mixture of proteins may be pre-fractionated, for example by size, or may be produced 
frorn cDNA libraries which are pre-firactionated by segregation of clones. The 
retained protein affinity reagents are then used to analyse complex samples of proteins 
whereby the antibodies are used to bind peptides which are then mass analysed 

30 

It will be appreciated that many of the same principles described herem for the 
digestion/cleavage, fractionation and mass analysis of proteins can also be applied to 
other polymeric molecules such as DNA or RNA, In the case of DNA or RNA, free 
phosphate and hydroxyl groups at the 5' and 3* termini respectively provide a means 

35 for very specific addition of chemical tags or direct binding to a sohd phase. Sequence 
specific restriction or modification enzymes provide for cleavage or modification of 
DNA molecules. Useful affinity reagents for DNA or RNA are nucleic acids 
themselves which can be specifically hybridised to a complimentary DNA or RNA 
sequence with attachment to a solid phase either before of after hybridisation. Using 

40 such methods, complex mixtures of nucleic acids can be fractionated and then 
subjected to mass analysis especially using mass spectrometry. 

The invention is illustrated by the following examples which some not be considering 
as limiting in scope; 
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Example 1 

The experiment described in the present example were conducted using a pair of modified 
smgle chain antibody (scAbs) genfes. Two modified scAbs were prepaild consisting of N- 

^J^"" ''^^^ "^"^^ P^'T ^ (Rosenberg AH et al.. Gene. 56:125-135 

1987) which pwides a T7 promoter followed by the ribosome binding sit; fiom T7 g«fe 10 

IhescAbconstructswereinsertedintothevectoratanNdelsitesuchLtthes^q^ 

encodmg the epitope tag followed the first ATG of T7 genelO. Tlie first construct c^^isted of 

359-365. 1995) ^^th *e FLAG epitope (MDYKDDDK) (Knappik A and Pluckthun A, 
moTechmque. ll: 754-761, 1994) added at the N terminus, al.d the b-zip domaii^fc-fos 
CAbate C. et a^Proc. Natl. Acad. Set USA. 87: 1032-1036. 1990) at the C-termiTal region of 
Ae Intern. The second consisteda scAb constructed from the anti-foetal antigen antibX 

^t^^\ Zyt'^"'T^''^"'F^''' l'»=131-140. 1994) withapoly-Hi^din^Sl^the 
Nt^mus. and the b-zip domam of c-jun (Abate C. et al, ibid) at the C-tWminal region of ft^ 

S'c^efbe^r"'" ^^I* 340-jun scAb were constructed as 

^ vector pPMlHis (Molloy P et al., ibid) was amplified with the 
pnmers RD 5 FLAG: 5 gcggatcccatatggactacaaagacgatgacgacaaacaggtgcagctgcagS' 
(Genosys Biotechnologies Europe Ltd, Cambridge. UK) and RD3'- ^ ^ ^ 

5'gcgaattcgtggtggtggtggtggtgtgactctcc3' (Genosys) which introduced the 5' FLAG cpitooe 
sequence and removed the 3' stop codon respectively. Tlie reaction mixture included O.l ug 
template DNi^2.6 muts of Expand™ High Fidelity PGR enzyme mix (Boehringer MannhJm, 
Lewes, UK.). Expand HF buffer (Boehringer Mannheim), 1.5 mM MgCl^, 200 uM 
deoxynuclcotide triphosphates (dNTPs) (Life Technologies. Paisley, UK) and ^pmbles of 
each prmier. Cycles were 96°C 5 minutes, followed by [95°C 1 minute, 50<>C 1 minute 72''C 
1 mmute] times 5. [95»C 45 seconds. 50''C 1 minute. 72-C 1 minute 30 seconds] times 8. 

nt, T ; u ''^"^ 2 "^"'^^J 5. finishing with 7200 5 minutes. 

l^r \ bp product obtained was cut with BamHl and EcoRI and cloned into the vector 
pUC19 (Boetaiiger Mamiheim). The DNA sequence was confirmed, using the Thermo 
Sequenase radiolabeled terminator cycle sequencing kit with ["P] didcoxy nucleotides 
(Amersham Life Science. Amersham. UK). The constiTict was cloned into pET5c vector 
a>romega UK Ltd. SouAampton. UK.) as a Ndel to EcoRI fiagment (see Molecular Cloning. 
^ ZaAoratorv A/aW eds. Sambrook J. fiitsch EF. Maniatis T. Cold Spring Harbor 
Laboratory Press 1 989, New York, USA). Plasmid DNA was prepared using Wizard® Plus 

^"'P'?P.^^>NA punfication System (Promega UK Ltd), or for larger scale, Qiagen 
Rasnud Midi Kit (Qiagen Ltd. Crawley, UK.). The new plasmid generated was nailed pET5c 
FLAG-oPs scAb. *^ 
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The fos cassette was assembled by PGR of overlapping oligonucleotides: 
Fos I for 5 ' -atggaattcctcgagaccgacaccctacaggcggaaaccgaccagctgga 
FosSOrev 5 '-tcgcgatttcggtttgcagcgcggatttttcgtcttccagctggtcggtt 
Fos 71 for 5*-aaaccgaaatcgcgaacctgctgaaagaaaaagaaaagctggagttcatc 
5 Fos 1 5 5rev 5 * ggaagcttgaattccgccggacggtgtgccgccaggatgaactccagctt 

The above oligonucleotides were included in a reaction mix at Ipmol each, and the reaction 
was driven using lOpmol primers FoslfS; 5'-atggaattcctcgagacc and Fos 155rS 5'- 
ggaagcttgaattccgcc using high fidelity polymerase and reaction components as previously 
1 0 The resulting I55bp product was digested with EcoRI, purified and cloned into EcoRI cut 
pUC 1 9 for sequence analysis using standard procedures (sec Molecular Cloning, A 
Laboratory Manual ibid). The Fos cassette was sub-cloned into the pET5c FLAG-oPs scAb 
plasmid as an XhoI-EcoRI fragment by substitution of the existing 320bp XhoI-EcoRI 
fragment canying the human constant region domain 

15 

The 340 scAb was produced by substitution the VH and VK of the 340 antibody in place of 
the a-Fs VH and VK in ppM IHis, The 340 VH was amplified with the primers 
5'cagctgcaggagtctgggggaggcttag3' (Genosys) and 5'tcagtagacggtgaccgaggttccttgaccccagta3' 
(Genosys). The reaction mixture included 0.1 ^g template DNA, 2.6 units of Expand*™ High 

20 Fidelity PGR enzyme mix, Expand HF buffer, 1 .5 mM MgC12, 200 |iM dNTPs and 25 pmoles 
of each primer. Cycles were 96°C 5 minutes, followed by [95*»C 1 minute, 50*»C 1 minute, 
72X 1 minute] times 5, [95°C 45 seconds, 50*'C 1 minute, 72*'C 1 minute 30 seconds] times 
8, [95°C 45 seconds, 50°G 1 minute, 72^C 2 minutes] times 5, finishing with 72°G 5 minutes. 
The 357 bp product was cut with PstI and BstEH and cloned into PsU and BstEH cut pPMlHis 

25 (see Molecular Cloning, A Laboratory Manual, ibid). Similarly, the 340 VK was ampUfied 
with the primers 5 ' gtgacattgagctcacacagtctcct3 ' and 5 'cagcccgttttatctcgagcttggtccg3 * 
(Genosys). The 339 bp product was cut with SstI and Xhol and cloned into SstI and Xhol cut 
modified pPMlHis (produced above). The DNA sequence was confirmed, using the Thermo 
Sequenase radiolabeled terminator cycle sequencing kit with [^^P] didcoxy nucleotides as 

30 before.^ DNA for the 340 scAb in the vector pPMlHis was amplified with the primers RD 5' 
HIS: 5'gcggatcccatatgcaccatcatcaccatcaccaggtgcagctgcag3' (Genosys) and RD 3' (given 
above) which introduced the 6 histidine residues at the 5' end and removed the 3 ' stop codon 
respectively. Reagents and conditions for amplification were exactly as for the a-Ps construct. 
The 1 1 14 bp product obtained was cut with BamHI and EcoRI and cloned into the vector 

35 pUCl9 (see Molecular Cloning, A Laboratory Manual, ibid). The DNA sequerice was 
confirmed as before and the construct was cloned into pET5c vector as a Ndel to EcoRI 
fragment to generate the plasmid pEt5c HIS 340 scAb. 

The jun cassette was assembled by PGR of overlapping oHgonucleotides: 
40 Jun 1 for 5'-atgagaattctcgagcgtatcgctcgtctggaagaaaaagttaaaaccct 

Jun 85rev 5*-tagcggtggaagccagttcggagttctgagctttcagggttttaactttt 
Jun 7 1 for 5 ' -tggcttccaccgctaacatgctgcgtgaacaggttgctcagctgaaacag 
Jun I46rev 5'-catgcgaattcgtggttcataactttctgtttcagctgagcaacc 

45 The above oligonucleotides were included in a reaction mix at Ipmol each, and the reaction 
was driven using lOpmol primers Jim Ifor-S; 5'-atgagaattctcgagcg and Junl46rev-S; 5'- 
catgcgaattcgtggttc using high fidelity polymerase and reaction components as previously. The 
resulting 146bp product was digested with EcoRI, purified and cloned into EcoRI cut pUC19 
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for sequence analysis using standard procedures (see Molecular Cloning, A Laboratory 
Manual ibid) The Jun cassette was sub-cloned into the pEt5c HIS 340 scAb plasmid as an 
XhoI-EcoRI fragment by substitution of the existing 320bp XhoI-EcoRI fragment canyine the 
human constant region domain 

Plasmi^ his-340-jun and FLAG-aPs-fos, were used as templates for PCR using biotinylated 
pnmer BioT7; 5 -agatctcgatcccgcaaatta and primer petrev;-5'-aaataggcgtatcacgaggcc Primers 
were supphed by GenoSys (Cambridge. UK) and used in the reacti<^^at^conLSnT 
Ipmol. Components and PCR conditions were as previously. The his-340-jun reaction 
product was 992bp, and the FLAG-aPs-fos reaction product was 1002bp. n« products were 
purified usmg a spm purification cartridge (Qiagen, Crawley, UK) and diluted to lOOne/lU 
concentration. Quantitation was by UV absorbance at 260nm. 500ng biotin labelled mA 
was reacted with lOjil streptavidin coated magnetic particles (Bangs labs. Fishers USA) The 
reactiOTi was conducted m a siliconised microcenlrifuge tube in a volume of 500iU PBS 1% 
(w/v) BSA for 10 mmutes at room temperature. Following binding, the particles were 
collected by magnet (Dynal, Bromborough, UK) and washed three times using PBS 1% BSA. 

FolloVving the final wash, in vitro traislation reaction vi^ initiated by addition"5f25ul T7 
20 m translation mix (Promega, Southampton, UK) supplemented with 

20 biotmyl lysme tRNA (Promega). The translation reaction was conducted at 30°C for 60 

mmutes tiien placed on ice. Particles were collected by magnet, and washed using ice cold 

PBS contammg 1% BSA. 
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In sonie experhnente, non-magnetic steptavidin particles were used in lYTT reactions (Bangs 
Labs, Fishers, USA). In such cases particles were recovered during wash cycles bv 
centrifugation. ^ j j 

In some experiments, coloured streptavidin particles, magnetic and non magnetic (Bangs 
Labs), were used in IVTT reactions. v . e 



Li son^e expemnents translation products bound to the particles were detected using antibodies 
for eithCT the Flag or the his6 epitope engineered into each of the model gene constructs 
Antibodies were added to the washed particles diluted in PBS. Incubations were for 60 
mimules at 4 C with gentle mixing. A secondary reagent (anti-mouse-HRP conjugate) was 
35 added at the recommended dilution in PBS and incubated for a further 30 minutes at 4°C 
Particles wer^ washed three times using 200^1 PBS before colour development with the 
chromogemc substrate. Reactions were read at 492nm. 

Prc^ein-protein binding reactions were conducted using lYTT proteins bound to the particle 
surface. In such experiments, non magnetic streptavidin particles were "captured" by protein 
^^r^^^"^ (fos:jun) binding to the surface of magnetic particles. Magnetic particles with fos 
rVTT product were mixed gently with non magnetic particles with jun bound on the surface 
Tlie reaction was conducted in \00^^\ PBS, BSA and allowed to proceed at room temperature 
a! .Tll"^^^; ^ a negative control reaction, non-magnetic particles with a Sea protein (a-Ps 
45 scAb; Molloy P et al., ibid) bound on the surface were mixed with the magnetic particles 
coated with the fos rVTT product. Following incubation, the particles were captured by 
magnet and washed six times using PBS, 1% BSA. 



wo 00/57183 PCT/GBOO/01015 

29 



The presence of the captured target protein gene was confirmed using PGR andDNA 
sequencing. For detecting the jun model gene, jun specific primers Jun Ifor-S and Juni46rev. 
were used in a PGR assay. The assay was initiated by addition of 10% (v/v) particles directly 
into the PGR mix. Components and reaction conditions were as previously. The 146bp jun 
5 specific product was detected by gel electrophoresis. For detecting the fos model gene, 

primers FoslfS and Fos 155rS were used in a PGR assay. Reaction conditions and detection 
of the 1 55bp fos specific product were as above. For detecting the negative control protein 
pnmers Seqlscab 5'agatccctactataggta and Seq2scab; 5'-ggtgagctcgatgtatcc were used to 
detect a 1 1 5bp product in the a-Ps scAb protein gene 

10 

In the above experiments, Jun PGR products were detected following capture by fos magnetic 
particle under conditions were no a-Ps scAb PGR products could be detected following 
interaction with the fos magnetic particles. 

15 Example 2 

In this example a single-chain antibody library was produced including unique peptide 
"barcodes". Human peripheral blood lymphocyte RNA was prepared according to 
standard procedures. Briefly, lymphocytes were prepared from 1 Oml heparinised 

20 blood taken from 16 normal healthy donors. Lymphocytes were collected following a 
density gradient centrifugation procedure using Lymphoprep medium (Sigma, Poole, 
UK). RNA was prepared using the QuickPrep system and instructions provided by the 
supplier (Pharmacia, St Albans, UK). Synthesis of cDNA was conducted using a 
cDNA synthesis kit (Pharmacia, St Albans, UK) and random hexamer primers with 

25 conditions recomimended by the supplier. Immunoglobulin heavy chain variable 
region (Vh) and light chain variable regions (VI) were amplified from cDNA in 
separate PGR mixes using primer sets designed to maximise Vh and VI repertoires. 
Primer sets were as described previously (Marks J.D. et al 1991, Eur. J. Immunol. 21: 
985). VH and VI PGR reactions were conducted using, 2.6 units of Expand™ High 

30 Fidelity PGR enzyme mix (Boehringer Mannheim, Lewes, UK.), Expand HF buffer 
(Boehringer), 1 .5 mM MgClj, 200 ^M deoxynucleotide triphosphates (dNTPs) (Life 
Technologies, Paisley, UK) and 25 pmoles of each primer pool. Gycles were 96^G 5 
minutes, followed by [95''C 1 minute, 50°G 1 minute, 72°C 1 minute] times 5, [95°C 
45 seconds, 5(j°G 1 minute, 72**G 1 minute 30 seconds] times 8, [95°C 45 seconds, 50° 

35 CI minute, 72°C 2 minutes] times 5, finishing with 72*'C 5 minutes. 

In a separate PGR, a linker fragment of form (Gly4Ser)3 (Huston J.S. et al 1988, PNAS, 
85: 5879-5883) was amplified from a cloned template pSWl-ScFvD 1.3 (McGafferty 
et al, 1990, Nature 348: 522-554) using primers sets detailed previously (Marks, J. D 
40 in Antibody Engineering, Qd Borrebaek G. A.K New York O.U.P., 1 995). The 93bp 
linker fragment product was annealed together with an equimolar mixtiu-e of the Vh . 
and VI PGR products. The mixture was further amplified in a "pull through" reaction 
using flanking primers HuVHBAGKsfi and HuFORNot as detailed in Vaughan et al 
(Vaughan T.J. et al 1996, Nature Biotech. 14: 309-314), All fragments used in the 
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pull-through reaction were purified free of their initial primers prior to inclusion in the 
reaction. Purification was conducted using the Wizard PGR Preps system from 
Promega (Promega, Southampton UK). 

The assembled contig of fonn Vh-linker-Vl, was digested with restriction enzymes 
Sfil and NotI (Boehringer) using standard conditions and purified as above. The 
purified fragment was annealed with a double stranded synthetic oligonucleotide 
adapter mix designed to introduce a V8 protease cleavage site juxtaposed with a tract 
of randomised sequence in frame with the C-terminus of die VI gene. This V8/unique 
sequence barcode was produced by annealing a pair of synthetic oUgonucleotide pools 
of fonn 5'-ggccgcgaggaagaggaa((atg)/(can)/(agn)/(aan)/(gan)/(ttn)]2gc^ and 5'- 
ggccgc[(naa)/(ntc)/(ngt)/(nct)/(nag)/(cat)]2Ctccttctcctcgc-3'. This linker has NotI 
compatible ends (underlined) and therefore facilitates tiie insertion of tiie complete 
single chain antibody-V8/unique sequence barcode firagment into Sfil^Notl prepared 
pCANTAB 5 (Pharmacia) phagemid vector. 

The unique sequence barcode was designed to avoid the introduction of stop codons 
and fiirther biased to exclude encoding residues with greater than two alternative 
codons. By this strategy, the number of specific oligonucleotides required to identify 
a given de-coded peptide sequence, is minimised. In all, the unique sequence barcode 
is able to encode 1 1 of the 20 amino-acids. In addition to the V8 peptidase cleavage 
site (a string of 4 glutamic acid residues), the sequence barcode is 12 codons long. 
Thus from the repertoire of 1 1 amino acids (10 of which are encoded by either of two 
codons), is able to encode 1 1 ^^/2 = -1 .5x10*^ different peptides. 

25 

The assembled scfv fi^gment (Vh-linker-Vl) with Sfil and NotI prepared ends was 
annealed and ligated to the NotI sequence-barcode adapter and re-purified. For 
experiments expressing the human scfv library by phage display, the complete 
fragment was Ugated into Sfil-NotI prepared pCANTAB 5 (Pharmacia) phagemid 
30 vector, and transformed into competent TGI E.coli. 

For other experiments using in vitro transcription and translation (IVTT), the 
assembled scfv library was subcioned into Sfil NotI prepared pCANTAB5-T7. This 
vector is the same as the commercially available pCANTABS except it was modified 

35 to include the T7 promoter sequence (ttaatacgactcactata) inserted at the Hindlll site at 
position 2235. The modification was achieved by ligation of a double-stranded 
synthetic DNA linker of sequence 5'- agctaatacgactcactata into Hindm cut and de- 
phosphorylatedpCANTAB5. Recombinant clones containing the T7 promoter were 
selected using a diagnostic PCR. 

40 ' 

Following ligation and transformation into competent TGI E.coli, cells were grown 
for 1 hour in 1ml of SOC raediimi and then plated onto TYE medium with lOOug/ml 
ampicillin. Colonies were scraped off plates into 5ml of 2x TY broth containing 
ampiciliin. The cultured library was used to prepare DNA for IVTT reactions. 
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The pCANTAB5-T7 Scfv library DNA was used in an in vitro translation reaction. 
The IVTT was conducted using the T7 Quick coupled transcription translation mix 
(Promega, Southampton, UK) and 1 O^g of the pCANTAB5-T7 Scfv library DNA in a 
5 total volun^e of 50^1. The translation reaction was conducted at 30°C for 90 minutes 
then placed on ice. In some experiments reactions were monitored for the presence of 
translation products using ^^S-methionine incorporation assays. Reactions were stored 
at -70 C prior to use in binding and screening assays. 

1 0 The single-chain antibody library was used to in a binding reaction to recombinant 

human p53 protein (Oncogene Research Products-Calbiochem, Nottingham, UK). The 
IVTT mix was diluted xlO fold in PBS and used in a binding assay to human 
recombinant p53 protein immobilised in a 96-well microplate. The p53 protein was 
immobilised by overnight incubation at a concentration of lOO^g/mi in phosphate 

15 buffer at 4°C. The plate was washed using PBS 0.5% (w/v) BSA and the diluted IVTT 
mix added to the test and control wells for binding. The binding reaction was 
conducted at 37*'C for 90 minutes. The plate ^yas washed x3 using PBS-T (PBS + 
0.05% v/y tween-20) and subjected to V8 proteiase digestion (Takara, Wokingham, 
UK). Protein fragments were collected from the supernatant and size fractionated to 

20 exclude the V8 protease and other large species before analysis by MALDI-tof 

MALDI-tof fragment analysis identified a number of peptide fragments. The peptide 
sequences were used to design a set of corresponding synthetic oligonucleotides. The 
oHgonucleotides were used in a PGR based screen of the single chain library. Pfu 
turbo (Stratagene Europe) DNA polymerase was used to synthesise complementary 
strands in members of the human single-chain antibody library DNA. Following 15 
rounds of thennal cycHng, the product was subjected to Dpnl digestion. This step 
depleted the mixture of parental plasmid molecules to ensure that only the newly 
synthesised primed products were propagated. of the reaction was transfonned 
into TGI competent cells and plated onto LB plates containing lOO^ig/ml ampicillin. 
Individual clones were picked, expanded and DNA prepared according to standard 
procedures. The DNA was used directly in a second round of screening involving 
IVTT, antigen binding, V8 protease digestion, MALDI-tof fragment analysis. After 2 
rounds of selection 6 scFv's were isolated which bound recombinant p53. 

Example 3 

The experiments described in the present example were conducted using an Fab 
expression vector pC5A8-03, the construction of which is as follows. The vector 
pC5A8-01 is based on the vector pLITMUS28 (New England Biolabs, MA. USA) 
which provides an inducible lac promoter and a M13 origin of replication. The Fab 
region of the antibody was assembled from two DNA fragments encoding the variable 
region (VH) and first constant region (CHI) of the heavy chain and the variable region 
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(VK) aad constant region (CK) of the kappa light chain of a humanised monoclonal 
antibody 5A8 dn-ected against CD4 (Reimann KA. et al.. Aids research and human 
retroviruses 13, 11: p933, 1997). These fragments were fused to the pelB leader 
sequence (Lei S-P. et al. Journal of Bacteriology, 169: 9: 4379-4383 198?) and 
5 inserted between the BglU and Bsmi restriction sites of pLITMUS28 'as described 
below. AU foUowmg molecular biology procedures will be familiar to tiiose skiUed in 
the art and can be found in Molecular Cloning, A Laboratory Manual eds. Sambrook 
L, Fntscb EF. and Mamatis T. Cold Spring Harbor Laboratory Press 1989 New York, 
USA. All oligonucleotides were synthesised by Genosys Biotechnologies Europe Ltd 
10 Cambridge, UK. Unless otherwise stated, all restriction endonucleases were purchased 
from Life Technologies, Paisley, UK). All polymerase chain reactions were earned out 
usmg pfii DNA polymerase (Promega, Southampton, UK). 

In order to assemble the light chain fragment, the pel B leader sequence was amplified 

15 usmg the polymerase chain reaction (PCR), using a Hybaid Touchdown Thermal 
Cycler, from clone pPMl-fflS which contains a single-chain antibody fragment 
(scAb) against Pseudomonas aeruginosa (MoUoy, P. et al. Journal of Applied 
Bacteriology, 78: 359-365, 1995). This initial reaction was carried out using 
ohgonucleotides OLOOl which encodes a BgllL restriction site, the N terminal residues 

20 of the pelB leader sequence and the Shine Dalgamo sequence and OL002 which 
encodes the C-terminus of the pel B leader and the N-terminal residues of a kappa 
Hght Cham from pDIVKV3 (ref). The product of this reaction was purified from 
NuSieve GTG agarose (Flowgen, Lichfield, UK) using a Wizard® PCR purification 
kit (Promega UK Ltd., Southampton, UK) denatured and used, in conjunction with 

25 OL004 which encodes the junction of the variable and constant regions of tiie kappa 
hght chain, to amplify the variable region of the kappa chain from clone pDIVKV3, by 
PCR using standard protocols. The constant region of the kappa hght chain was 
ampUfied, by PCR, from clone pPMl-HIS (Molloy et al.) using OL003 which encodes 
the C-terminal residues of the variable region and tfie N-terminal residues of the 

30 constant region and OL005, which encodes the C-terminal residues of the constant 
regions of the kappa tight chain and the restriction enzyme site EcoKl, These two 
fragments were subsequently amphfied by overlap PCR using OLOOl and OL005, 
digested with BglR and Ecom and cloned into pLnmJS28 in order to produce 
pC5A8-01. ^ 

35 

The heavy chain was assembled by amphfication of the pel B leader sequence from 
the assembled hght chain using OL006, which encodes an EcoBl site and the Shine 
Dalgamo sequence and OL007, which encodes the C-terminal residues of the pel B 
leader sequence and the N-terminal residues of a heavy chain from pDIVHV4. The 
40 product of diis reaction was used, alongside OL009, which encodes the junction of the 
vanable and constant regions of the heavy chain, to amplify the variable region of the 
IgGl heavy chain from clone pDIVHV4. Exon 1 of the heavy chain constant region 
was amphfied, by PCR from clone pSVgptHuIgGl using OL008 and OLOlO, which 
encode the C-lerminal residues of the variable regions of the heavy chain and the r^- 
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tenninus of the constant chain (OL008) and the C-tenninal residues of exon 1 and the 
restriction site for Sstl (OLOlO). The products of these reactions were ampHfied by 
overlap PGR using OL0G6 and OLOlO, digested with EcoKL and SStl and cloned into 
^ pL^MUS28 containing the light chain fragments in order to produce pC5A8-02. 

The C-tenninal residues of CHI and a C-terminal FLAG tag sequence (DYKDDDDK) 
(Knappik A. and Pluckthun A. Biotechniques, 17; 754-761, 1990) were added using 
OLOl 1 and OL012 which included the restriction sites ^colCRI and Bsmi in order to 
produce pC5A8-03. Alternatively, these tags could include the 6HIS tag or MS tags 
10 (see example). 

The oligonucleotides utilised in the production of pC5A8-0, pC5A8-02 and pC5A8-03 
are listed below; 

15 OLOOUS-GCKSCAGATCrrFAACTTTAAGAAGGAGATATACATATGAAATACCrrATTGCCTA^ 

OL002; 5' GGGTCTGGGTCATAACGATATCGGCCATCGCTGGTTGGGCAGC 3^ * * ^''''^ ^ 

OL003; 5' GGTACCAAACTGGAGATCAAACGGACTGTGGCTGCACCATCT3* 

OL004: 5' AGATGGTGCAGCCACAGTCCGTTTGATCTCCAGTTTGGTACC 3' 

OL005; 5* GATCGAATrCCTAAGACTCTCCGCGGrrGAAGCTCnTG 3' 
2U OUm; 5* GATCGAATTCTAACTTTAAGAAGGAGATATACATATG 3' 

OL007; 5* GGACTGAACCAGTTGGACTTCGGCCATCGeTGGTTGGGCAGC 3* 

OL008; 5' ACCCTGGTTACCGTCTCCTCAGCCTCCACCAAGGGCCCA-rC 3' 

OL009: 5' GATGGGCCCTTGGTGGAGGCTGAGGAGACGGTAACCAGGGTAC 3' 

OLOlO; 5' GATCGAGCTCTGC 1 1 IC ri GTCCACCTTGGTGTTGC 3' " 
25 OLOl 1 ; 5' CCCAAATCTTGCGCTGCAGACTACAAAGACGACGACGACAAATAGCTCGAGC 3* 

OL012: 5' rrAAGCrCGAGCTArrTGTCGTCGTCGTCTTTGTAGTCrrGCAGCGCAAGATrTGGG 3* 

The production of functional Fab was demonstrated by ELIS A. In summary, the above 
vector was transferred into E.coli strain DH5a and grown at 37''C in the presence of 

30 100 ^g/ml ampicillin and 1% glucose until an ODeoo of 0.5 was attained. Protein 
production was mduced by the addition of ImM isopropylthio-p-D-galactoside (IPTG) 
in the absence of glucose. The periplasmic fraction was released by osmotic shock 
using 30mM Tris HCl, 20% sucrose pH8,0, ImM EDTA followed by 5mM MgS04 
(Molloy, P, et al. Journal of Apphed Bacteriology, 78: 359-365, 1995) and added 

35 directly to an Immulon 4 ELISA plate (Dynex,) which had previously been coated 
overnight with soluble human CD4 (Intracel Corp., Issaquah, WA) at a concentration 
of Ijig/ml in phosphate buffered saline (PBS) pH7.4, at room temperature in a 
humidified chamber. Alternatively, the periplasmic fraction could be released by cell 
lysis or by the addition of ImM EDTA. Non specific binding was reduced by 

40 incubating the plate for 1 hour at room temperature with PBS containing 0.05% Tween 
20, 2% bovine senim albumin (BSA) and 0.05% thimerosal (Sigma) prior to addition 
of the soluble Fab. The anti-CD4 specific Fab was detected using goat anti-human 
IgG Fab specific Horseradish peroxidase conjugate (Sigma, UK) which was itself 
detected using 5,5' tetramethylbenzidine dihydrochloride (TMB)(Sigma, UK) and 

45 hydrogen peroxide in phosphate/citrate buffer pH5.0. Colour development was 
stopped after 30 minutes using 0.2N H2SO4 and the absorbance monitored at 450 nm. 
Alternatively ABTS/citrate (Sigma, UK) could be used for detection. 
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In order to produce a library of CDR3 sequences, unique restriction sites were introduced into 
vector pC5A8-03 by oligonucleotide-directed mutagenesis (Kunkel TA. Proc. Natl Acad Sci 
USA: 488-492 (1985) and Current Protocols in Molecular Biology eds . Ausubel FM Brent 
5 R.. Kingston RE., Moore DD., Seidman JG., Smith JA., Struhl K. John WUey & Son^ Ihc ) 
usmg the oligonucleotides listed below. The presence of the AatH and Hindm (5* and 3' to 
LCDR3) and Bssmi and SanDl (5' and 3' to HCDR3) restriction sites in the kappa light 
Cham and the heavy chain respectively, were confinned by digestion with the appropriate 
restncton enzymes. These piasmids, each contaming an additional restriction site, were 
10 designated pC5A8-04 to pC5A8-07. 

OL013; 5' GAAGACGTCGCTGTTTAC 3' 
OL014; 5' GGTACCAAGCTTGAGATC 3' 
15 OL015;5'CTACTGCGCGCGTGAAAAAG3' 
OL016; 5' GGGTCAGGGGACCCTGG 3' 

Following digestion of pC5A8-07 with AatH and Hindm, the highly variable residues in 
CDR3 of the kappa light chain variable regions were randomised using a mixture of 
20 degenerate ohgonucleotides carrying the anchor residues (aa 83-88 and aa 97-103) and an 10 
nucleotide palindromic sequence at their 3' end which encompasses the restriction 
endonuclease site for Hindm. These oligonucleotides hybridise at their 3' ends and then act as 
a substrate for DNA polymerase resulting in the production of double-stranded homoduplex, 
^ which is digested with the two restriction enzymes and cloned into the digested vector using 

25 standard protocols (see Current Protocols in Molecular Biology eds . Ausubel FM., Brent R. 

Kingston RE., Moore DD., Seidman JG., Smith JA,, Stnihl K. John Wiley & Sons! Inc.) The 
ohgonucleotides were prepared such that residues 91, 92, 93, 94, 95, 95A, 95B and 96 were 
randomised by the inclusion of equal concentrations of each nucleotide at each step of the 
oligonucleotide synthesis (Genosys, Cambridge, UK) 

30 

The sequence of the mutagenic oligonucleotides is based on a CDR3 length of 10 
residues. Residues 89 and 90 are relatively conserved and are therefore fixed in this 
example. The residues to be randomised are shown in italics: Additional libraries with 
a CDR3 of 6,7,8 or 9 residues can also be created by varying the length of the 
35 randomised region. 

Positive strand; 5' 

GAAGACGTCGCTGTTTACTACTGCCAGCAGAW5iV7V5;v^^ 
CTTCGGTGGTGGTACCAAGCTTGG 3' 



40 



Negative stand: 5' 

CCAAGCTrGGTACCACCACCGAAGGT5//A^57W5^ 
GCAGTAGTAAAC AGCGACGTCTTC 3 ' 



45 



CDR3 of the heavy chain was randomised using the restriction endonuclease sites 
BssHn and SanDl and the mutagenic oligonucleotides listed below, in a similar 
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maimer to that described in the previous section. In this case residues 95-lOOD are 
randomised. The residues to be randomised are shown in italics. Additional libraries 
with a CDR3 of 9,11 or 12 residues can also be created by varying the length of the 
randomised region. 

5 

Positive strand; 5' 

CTACTGCGCGCGimSNNSmsmsmSNNSNNSNNSNNSmSTTCGC^ 
GGGTCAGGGGACCCCT 

10 Negative stand: 5' 

AGGGGTCCCCTGACCCCAGTAAGCGAA5A/7V57^^ 
7VS7V7WVCGCGCGCAGTAG 3* 

A library in which both the heavy and Hght chains contained a randomised CDR3 was 
15 produced by carrying out both the heavy and light chain mutagenesis methods 
described above. 

In order to increase the efficiency of selection of high affinity binders, the FLAG t^g 
mentioned above was replaced with a mass tag using the restriction endonucleases Pstl 

20 and Xhol. In order to increase the library size further two tags can be used. In this case 
the tags must differ in length by at least two residues in order to be distinguished 
following the removal of tags 1 and 2 with a protease such as Factor Xa. The 
oligonucleotides were designed with a palindromic sequence at their 3' end which 
encompass the restriction endonuclease site for JOiol. The oligonucleotides hybridise 

25 at their 3' ends and then act as a substrate for DNA polymerase resulting in the 
production of double-stranded homoduplex, which is digested with the two restriction 
enzymes Pstl and Xhol and cloned into the digested vector using standard protocols 
(see Current Protocols in Molecular Biology eds . Ausubel FM., Brent R., Kingston 
RE., Moore DD., Seidman JG., Smith JA., Stnihl K. John Wiley & Sons, Inc.). 

30 

As an example a tag of 8 residues can be created using the oligonucleotide 5' NAC 
NCC NGG NTG TKC VAG GNV CNT 3'. The length of this Tag is increased to 11 
residues if a second tag of 8 residues is also included due to the incorporation of the 
site for protease Factor Xa, which is shown in italics. This allows the tags to be 
35 identified as tag 1 or tag 2 following their removal and analysis by mass spectroscopy. 

Single tag. 

Forward OUgo; 5* GCG CTG CAG GAY GGN CGN NAC NCC NGG NTG TKC VAG GNV CNT 
40 TAGCTCGAGCTA 3' 

Reverse Oligo; 5' TAG CTC GAG CTA ANG BNC CTB GMA CAN CCN GGN GTN CCG 
CCC GTC CTG CAG CGC 3 ' 



45 



Double tag. 
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Forward Oligo; 5' GCG CTG CAG GAY GGN CGN NAC NCC NGG NTG TKC VAG GNV CNT 

G^yGG^v cg;vnacnccnggntgtkcvaggnvcnttagctcga^ . 

Reverse OUgo; 5' TAG CTC GAG CTA ANG BNC CTB GMA CAN CCN GGN GTN CCG CCC 
GTC ANG BNC CTB GMA CAN CCN GGN GTN CCG CCC GTC CTG CAG CGC 3 ' 

Example 4 



10 



In order to select high affinity binders, the initial hbrary was transferred into E coli 
DH5a by electroporation (Bio-rad) and plated onto L agar containing lOO^ig/ml 
ampicillin and 1% glucose and incubated at 37°C overnight. The transformed cells 
were harvested and used to inoculate a firesh batch of L broth containing lOO^g/ml 
ampicillin. The remainder of the library should be retained and stored at -lO^C and 
15 used as starting material for the rescue of high affinity clones, as described later The 
newly moculated cultures were incubated for 2 hours at 37«C prior to the addition of 
isopropylthio-p-D-galactoside (IPTG) to a final concentration of O.lmM. The cultures 
were then mcubated at ST^'C for a fiirther 3 hours. 

20 1 00 ml cultures of bacteria producing the soluble Fab library were centrifuged at 4000 
rpm for 20 mmutes at 4°C and the resulting pellet resuspended in phosphate buffered 
saline containing 1 mM EDTA. Following agitation for 5-20 minutes on ice, the 
EDTA permeabiHses the outer membrane and allows the periplasmic contents to'leak 
out. The supernatant was then clarified by centrifiigation and the supernatant used in 

25 subsequent steps. Alternative protocols for the release of the periplasmic contents 
could also be utilised (MoUoy, P. et al. Journal of Apphed Bacteriology, 78: 359-365 
1995 and Molecular Cloning, A Laboratory Manual eds Sambrook J., Fritsch EF. and 
Maniatis T. Cold Spring Harbor Laboratory Press 1989, New York, USA). 

30 The periplasmic extract, containing the Fab library was aliquoted into Nunc- 
unmunotubes which had been coated overnight with soluble human CD4 (Intracel 
Corp., Issaquah. WA) at a concentration of 1 jig/ml in phosphate buffered saUne (PBS) 
pH7.4, at room temperature in a humidified chamber. Non specific binding was 
reduced by incubating the tubes for 1 hour at room temperature with PBS containing 

35 0.05% Tween 20, 2% bovine serum albumin (BSA) and 0.05% thimerosal (Sigma) 
pnor to addition of the soluble Fab. After allowing the Fab to bind to the CD4 antigen 
for I hour at room temperature, the unbound Fab was eliminated by washing the tubes 
20 times with PBS, 0.05% Tween 20. 

40 In order to identify the amino acid sequence of those Fabs which remain bound, the 
mass tag was removed- with Factor Xa using standard protocols. The mass tag was 
then analysed by MALDI-TOF (MS/MS) spectrometry in which the molecular weight 
of each tag was determined then the sequence information obtained by analysis of the 
secondary ionisation events. By combining this information the amino acid sequence 

45 of the tags could be assigned. 
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hi some instances it may be necessary to increase the efficiency of protease cleavage 
by ehitmg the bound Fab, neutralising and purifying the Fab from the other E coU 
protons by affinity purification using a sepharose-anti Oc column (Pierce Wairiner 
Cheshu-e, UK) prepared according to the manufacturers instructions. The mass tag cail 
then be removed from the bound Fab using Factor Xa. 

Following fte identification of the mass tag, a fiirther two oUgonucleotides were 
produced. The 3 ohgonucleotide encodes the sequence of the mass tag while the 5' 
ohgonucleotide is OLOOl which encodes the sequence at the N-teiminus of the Fab. 

Positive stand; 5' GG GCA GAT CTT TAA CTT TAA GAA GGA GAT ATA CAT 
ATG AAA TAC CTA TTG CCT ACG G 3" 

15 Negative strand; 5' TAG CTC GAG CTA ANG BNC CTB GMA CAN CCN GGN GTN CCG CCC 
GTC ANG BNG CTO GMA CAN CCN GGN GTO C«3 CCC GTC CTG CAGES' 

The clone containing the high affinity binder was rescued by adding 10 ^1 of the E 
coll library to a PGR reaction containing the oligonucleotides described abovb The 
conditions required for this reaction may vary depending upon the oligonucleotides 
bong utihsed. Following amplication, the PGR product was sequenced and 
subsequently punfied &om low melting point agarose, digested with AatU, which 
occurs at the N-terminus of CDR3 of the kappa light chain and SanDl, which occurs at 
the C-tenmnus of CDR3 of the heavy chain in vector pC5A8-07 and transferred into 
vector PC5A8-07 which had been digested with the same restriction endonucleases 
usmg standard protocols (see Molecular Cloning. A Laboratory Manual eds Sambrook 
J Fntsch EF. and Mamatis T. Cold Spring Harbor Laboratory Press 1989, New York. 
USA). The resultmg plasmid was transferred into E. coU DH5a by electroporation 
usmg standard protocols and stored at -70°C. Alteriiatively, the product of the PGR 
reaction could be digested with a number of alternative restriction endonucleases and 
transferred into alternative vectors for Fab expression. 

In some cases a number of mass tags may be present following the initial round of 

panmng. In this case, a library of clones are amplified fix)m the stored library using a 
35 mixture of 3" ohgonucleotides. This limited Ubrary can then be subjected, to further 
rounds of panmng, the bound clones can be re-analysed by MALDI-TOF and the 
sequence of the internal tags used to create a limited repertoire of PGR primers. 



20 
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In order to confirm the affinity of the selected anti-CD4 specific Fab, periplasmic 
extracts should be prepared as described above and used immediately in a CD4 
specific ELISA. The ^parent affinity is a combination of the actual affinity and the 
concentration of the Fab therefore the concentration of the Fab should be established 
by carrying out an additional capture ELISA on the same extract in which a standard 
concentration curve is produced against the FLAG tag or the human Ck domain 
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(McGregor DP., Molloy PE., Cunningham C. and Harris WJ. Molecular Immunoloev 
31,219-116.1994). 



5 Examples 

In this example, human p53 protein was modified with a chemical tag at its N 
terminus, cleaved with a protease, the chemically tagged peptide then recovered using 
a tag-specific monoclonal antibody and the peptide then analysed by MALDI-ToF. 

10 p53 protein was a gift fi-om Dr Borek Vojisek (University of Brno, Czech Republic). 

lOOug of p53 protein with the succinimide ester of (methyl sulphonyl) ethyl carbonate 
according to Mikolajczyk et al., Bioconjugate Chem,, vol 7 (1996) pl50-158 in order 
to block lysine side-chains. The blocked protein was dissolved at Img/ml in 0 IM 
sodium bicarbonate buffer pH8.5 and NHS-SS-biotin (Pierce, Chester, UK) was added 

15 to 1 OOug/ml final. The reaction was carried out for 6 hours at room temperature and 
termmated with ethanolamine. The protein mixture was then passed down a Sephadex 
G25 column (Pharmacia, Milton Keynes, UK) in PBS and the void volume collected 
usmg A280 measurements of the eluates. 40ul of eluate containing 2ug p53 was then 
heat denatured (95c for 5 mins), copied to 37c and lug endoproteinase Arg-C (from C. 

20 histolyticum, Calbiochem, Nottingham, UK) was added and the mixture incubated at 
37c for 1 hour. Then lOul of streptavidin-agarose (Sigma, Poole, UK) in PBS was 
added and the mi^fture shaken for 1 0 minutes. The agarose was pelleted at 1 6000g for 
1 min and washed three times in TSO buffer (75mM Tris.HCl, 200mM NaCl, 0.5% N- 
octyl glucoside, pH8) and three times in TSMK (lOmM Tris.HCl, 200mM NaCl, 5mM 

25 2-mercaptoethanol, pH8). Finally, 1 Oul of a saturated solution of alpha-cyano-4- 
hydroxycinnamic acid in 1% aqueous trifluoroacetic acid/acetonitrile (1:1 v/v) was 
added to the washed beads and lul of this was loaded onto the mass spectrometer chip. 
The analysis was carried out using a Perseptive Biosystems Voyager-DE STR 
Biospectrometry Workstation (Perseptive Biosystems). The mass spectra were 

30 collected by adding spectra from 200 laser shots. 



The results showed a major peak corresponding to the 65 amino acid N temiinal Arg- 
C endoprotease fragment with no significant levels of other p53 Arg-C peaks. 

35 Example 6 . 

The method of example 5 was repeated except that the N terminal biotin-tagged 
peptide was used to isolate a single-chain Fv antibody fragment from a phage display 
Hbrary of single-chain Fv's, Subsequently, the single-chain Fv was used to isolate the 
40 N-terminal peptide fragment from a protease digest of the test protein as confimied by 
MALDI-ToF. An extract of normal human brain, prepared as in example 4, was 
conjugated to KLH according to Harlow and Lane, "Antibodies" (1988) (Cold Spring 
Harbor Pubhcations) and used to inamunise two BalbC mice. 2 doses were given 
intra-peritoneally with an interval of 4 weeks between them. 3 to 4 days after the 2nd 
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inoculation, the mice were sacrificed and spleens removed by dissection. Spleen 
mRNA preparation was then initiated using QuickPrep™ mRNA purification kit 
(Pharmacia) according to the manufacturer's instructions 

5 The Pharmacia Recombinant Phage Antibody System (Pharmacia) was used to 

produce a library of mouse single chain Fvs (ScFv). First-strand cDNA was generated 
firom the mRNA using M-MuLV reverse transcriptase and random hexamer primers 
Antibody heavy and hght chain genes were then amplified using specific heavy and 
light chain pnmers complementary to conserved sequences flanking the antibody 
vanable domains. The 340 and 325 base pair products generated for heavy and light 
Cham DNA respectively were separately purified following agarose gel 
electrophoresis. These were then assembled into a single ScFv construct using a DNA 
linker-primer mix to give the VH region joined by a (Gly4Ser)3 peptide to the VL 
region. The assembled ScFv were amplified with primers designed to insert Sfi 1 and 
15 Not 1 sites at the 5' and 3' ends respectively, giving an 800 bp product. This Segment 
was punfied, sequentially digested with Sfil and NotI, and repurified. The fiagment 
was then hgated into Sfil and NotI cut pCANTAB 5 phagemid vector. PCANTAB 5 
contains die gene encoding the Phage Gene 3 protein (g3p) and the ScFv is inserted 
adjacent to the g3 signal sequence such that it will be expressed as a g3p fiision 
20 protein. Competent E.coii TGI cells were transforaied with the pCantab 5/ScFv 

phagemid then subsequently infected with the M13K07 helper phage. The resulting 
recombinant phage contained DNA encoding the ScFv genes and displayed one or 
more copies of recombinant antibody as fiision proteins at their tips. 

25 Phage-displayed ScFv that bind to the peptides were tiien selected or enriched by 

panning. Briefly, the biotinylated and protease treated p53 preparation from example 
1 was applied to a streptavidin-coated glass slide (Radius Biosciences, Waltham, 
USA) and the slide was washed four times in PBS. After blocking with 2% non-fat 
dry milk in PBS, the phage preparation was applied and incubated for 1 hour. After 

30 washing 1 0 times with TBS/0.05% Tween 20, peptide reactive recombinant phage 
were detected with horse radish peroxidase conjugated anti-M13 antibody and 
revealed with o-phenylene diamine chromogenic substrate. These phage were 
subsequently eluted witii O.IM glycine JiCl pH2.2 and Img/ml BSA and neutrahsed 
with 2M Tris base. The eluted phage were amplified in JM103 grown in 25ml J broth. 

35 Two additional rounds of panning were undertaken and finally 10 single plaques were 
isolated, pooled and fiirther amplified. An aliquot of 10'° amphfied phage was 
incubated for 2 hours at 4c with O.lug of biotinylated and endoproteinase Arg-C 
digested p53 in TSO buffer. After 2 hours, 0.5ug of anti-M13 (Phamiacia) in TSO 
was added and incubated for 1 hour following which 5ul of protein AJG agarose 

40 (Sigma) was added and the mixture incubated for a fiuther 0.5 hours with swirling. 

The agarose beads were tiien pelleted, washed as in example 1 above and analysed by 
mass spectrometry. 
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The results showed the same major peak as in example I corresponding to the 65 
ammo acid N tenninal Arg-C eadoprotease fragment 

Example 7 

In this exan^le, a gene fragment encoding a test protein was subjected to priming with 
a syntheuc ohgonucleotide encoding a polyhistidine tag. The cDNAs were expressed 
by m vitro lianscnpUon and translation (IVTT) and the tagged peptide fragments were 
thai isolated using a nickel chelate column. These fragments were then used to isolate 
a single-chain Fv antibody fragment. Subsequently, the single-chain Fv was used to 
isolate a peptide fragment from a protease digest of the test protein as confirmed by 
mass spectrometry. 

Example 8 

The method of example 6 was repeated using a total protein preparation from cells and 
the chemically tagged peptide were used to isolate a collection of single-chain Fv 
antibody fragments. Subsequently, a mixture of twelve of these single-chain FVs was 
used to isolate peptide fragments from a protease digest of the test protein and 
analysed by mass spectrometry.- 



